One of my current projects involves natural language processing spell checking and correction. I wondered how difficult it would be to programmatically and automatically correct two accidentally combined words such as “railman” to “rail” and “man”. Answer: It’s really, really difficult. Bottom line: The techniques I tried for correcting combined words did not work great, but good enough, especially if the combined words are spelled correctly.
The ideas are best explained by looking at the output of a demo program:
Sub-words correctly spelled version Source word = railroadblock possible correction: rail roadblock 7405 1603 possible correction: railroad block 11998 50684
In this example, the source is “railroadblock”. The behind-the-scenes code assumes that any combined words are correctly spelled. The code tries all possible splits and prints a possible correction if both of the splits are correctly spelled (using the PySpellChecker libreary module).
OK, not too difficult. But there are two possible separations: “rail roadblock” and “railroad block”. If you want to return the best separation, you’d have to analyze the frequencies of the separated words: rail occurs 7,405 times in the spelling dictionary, and so on. Do you use separation with the highest average frequency, or the separation that has the largest minimum frequency, or something else?
Next, I looked at the scenario where the combined words might be incorrectly spelled:
Sub-words possible incorrect version Source word = ordcombaned possible correction: or combined 4128154 9895 possible correction: ordo moaned 51 420
I set up the example thinking that “ordcombaned” would be separated to “word” “combined”. But “ord” is corrected to “or” instead of “word” because “or” is much more common.
By the way, “ordo” is “A musical phrase constructed from one or more statements of a rhythmic mode pattern and ending in a rest”.
Anyway, the bottom line is that natural language processing is very tricky.
The music of the 1960s was very creative. Many rock bands experimented by combining unusual instruments. Here are three songs that I like that use a harpsichord. (Actually, for the third song, the band couldn’t find a harpsichord so they simulated one by recording a piano on slow speed and then speeding the recording back up).
Different Drum – The Stone Poneys (1967)
Sunshine Superman – Donovan (1966)
See Emily Play – Pink Floyd (1967)
Demo program:
# correct_joined_words.py
from spellchecker import SpellChecker
print("\nBegin correct combined words demo ")
print("\n----------------------------------- ")
print("\nSub-words correctly spelled version ")
spell = SpellChecker(distance=1, case_sensitive=True)
word = "railroadblock"
print("\nSource word = " + str(word))
if word in spell == True:
print("\nSource word is correctly spelled ")
n = len(word)
for x in range(1,n): # all possible split points
left = word[0:x]
right = word[x:n]
if left in spell:
is_left_correct = True
else:
is_left_correct = False
if right in spell:
is_right_correct = True
else:
is_right_correct = False
if is_left_correct and is_right_correct:
print("\npossible correction: " + left + \
" " + right)
print(str(spell.word_frequency[left]) + \
" " + str(spell.word_frequency[right]))
print("\n----------------------------------- ")
print("\nSub-words possible incorrect version ")
word = "ordcombaned"
print("\nSource word = " + str(word))
if word in spell == True:
print("\nSource word is correctly spelled ")
n = len(word)
for x in range(1,n): # all possible split points
left = word[0:x]
right = word[x:n]
left_corrected = spell.correction(left)
right_corrected = spell.correction(right)
if left_corrected is not None and left_corrected in spell:
is_left_correct = True
else:
is_left_correct = False
if right_corrected is not None and right_corrected in spell:
is_right_correct = True
else:
is_right_correct = False
if is_left_correct and is_right_correct:
print("\npossible correction: " + left_corrected + \
" " + right_corrected)
print(str(spell.word_frequency[left_corrected]) + \
" " + str(spell.word_frequency[right_corrected]))
print("\nEnd demo ")




.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference
Nice