Spelling Correction of Two Accidentally Combined Words Using Python

One of my current projects involves natural language processing spell checking and correction. I wondered how difficult it would be to programmatically and automatically correct two accidentally combined words such as “railman” to “rail” and “man”. Answer: It’s really, really difficult. Bottom line: The techniques I tried for correcting combined words did not work great, but good enough, especially if the combined words are spelled correctly.

The ideas are best explained by looking at the output of a demo program:

Sub-words correctly spelled version

Source word = railroadblock

possible correction: rail roadblock
7405 1603

possible correction: railroad block
11998 50684

In this example, the source is “railroadblock”. The behind-the-scenes code assumes that any combined words are correctly spelled. The code tries all possible splits and prints a possible correction if both of the splits are correctly spelled (using the PySpellChecker libreary module).

OK, not too difficult. But there are two possible separations: “rail roadblock” and “railroad block”. If you want to return the best separation, you’d have to analyze the frequencies of the separated words: rail occurs 7,405 times in the spelling dictionary, and so on. Do you use separation with the highest average frequency, or the separation that has the largest minimum frequency, or something else?

Next, I looked at the scenario where the combined words might be incorrectly spelled:

Sub-words possible incorrect version

Source word = ordcombaned

possible correction: or combined
4128154 9895

possible correction: ordo moaned
51 420

I set up the example thinking that “ordcombaned” would be separated to “word” “combined”. But “ord” is corrected to “or” instead of “word” because “or” is much more common.

By the way, “ordo” is “A musical phrase constructed from one or more statements of a rhythmic mode pattern and ending in a rest”.

Anyway, the bottom line is that natural language processing is very tricky.


The music of the 1960s was very creative. Many rock bands experimented by combining unusual instruments. Here are three songs that I like that use a harpsichord. (Actually, for the third song, the band couldn’t find a harpsichord so they simulated one by recording a piano on slow speed and then speeding the recording back up).


Different Drum – The Stone Poneys (1967)



Sunshine Superman – Donovan (1966)



See Emily Play – Pink Floyd (1967)




Demo program:

# correct_joined_words.py

from spellchecker import SpellChecker

print("\nBegin correct combined words demo ")

print("\n----------------------------------- ")
print("\nSub-words correctly spelled version ")

spell = SpellChecker(distance=1, case_sensitive=True)

word = "railroadblock"
print("\nSource word = " + str(word))

if word in spell == True:
  print("\nSource word is correctly spelled ")

n = len(word)
for x in range(1,n):  # all possible split points
  left = word[0:x]
  right = word[x:n]

  if left in spell:
    is_left_correct = True
  else:
    is_left_correct = False
  if right in spell:
    is_right_correct = True
  else:
    is_right_correct = False

  if is_left_correct and is_right_correct:
    print("\npossible correction: " + left + \
    " " + right)
    print(str(spell.word_frequency[left]) + \
    " " + str(spell.word_frequency[right]))


print("\n----------------------------------- ")
print("\nSub-words possible incorrect version ")

word = "ordcombaned"
print("\nSource word = " + str(word))

if word in spell == True:
  print("\nSource word is correctly spelled ")

n = len(word)
for x in range(1,n):  # all possible split points
  left = word[0:x]
  right = word[x:n]

  left_corrected = spell.correction(left)
  right_corrected = spell.correction(right)

  if left_corrected is not None and left_corrected in spell:
    is_left_correct = True
  else:
    is_left_correct = False
  if right_corrected is not None and right_corrected in spell:
    is_right_correct = True
  else:
    is_right_correct = False

  if is_left_correct and is_right_correct:
    print("\npossible correction: " + left_corrected + \
    " " + right_corrected)
    print(str(spell.word_frequency[left_corrected]) + \
    " " + str(spell.word_frequency[right_corrected]))

print("\nEnd demo ")
This entry was posted in Machine Learning. Bookmark the permalink.

1 Response to Spelling Correction of Two Accidentally Combined Words Using Python

  1. saurabh dasgupta's avatar saurabh dasgupta says:

    Nice

Leave a Reply