The objective of spell checking is the detection and correction of typographic and orthographic errors in the text at the level of word occurrence considered out of its context.

Nobody can write without any errors. Even people well acquainted with the rules of language can, just by accident, press a wrong key on the keyboard (maybe adjacent to the correct one) or miss out a letter. Additionally, when typing, one sometimes does not synchronize properly the movements of the hands and fingers. All such errors are called typos, or typographic errors. On the other hand, some people do not know the correct spelling of some words, especially in a foreign language. Such errors are called spelling errors.

First, a spell checker merely detects the strings that are not correct words in a given natural language. It is supposed that most of the orthographic or typographic errors lead to strings that are impossible as separate words in this language. Detecting the errors that convert by accident one word into another existing word, such as English then ?than or Spanish cazar ?casar, supposes a task which requires much more powerful tools.

After such impossible string has been detected and highlighted by the program, the user can correct this string in any preferable waymanually or with the help of the program. For example, if we try to insert into any English text the strings[3] *groop,*greit, or *misanderstand, the spell checker will detect the error and stop at this string, highlighting it for the user. Analogous examples in Spanish can be *caió, *systema, *nesecitar.

The functions of a spell checker can be more versatile. The program can also propose a set of existing words, which are similar enough (in some sense) to the given corrupted word, and the user can then choose one of them as the correct version of the word, without re-typing it in the line. In the previous examples, Microsoft Words spell checker gives, as possible candidates for replacement of the string caió, the existing Spanish words shown in Figure III.1.

In most cases, especially for long strings, a spell checker offers only one or two candidates (or none). For example, for the string *systema it offers only the correct Spanish word sistema.

The programs that perform operations of both kinds are called orthographic correctors, while in English they are usually called spell checkers. In everyday practice, spell checkers are considered very helpful and are used by millions of users throughout the world. The majority of modern text editors are supplied now with integrated spell checkers. For example, Microsoft Word uses many spell checkers, a specific one for each natural language used in the text.

The amount of linguistic information necessary for spell checkers is much greater than for hyphenation. A simple but very resource-consuming approach operates with a list, or a dictionary, of all valid words in a specific language. It is necessary to have also a criterion of similarity of words, and some presuppositions about the most common typographic and spelling errors. A deeper penetration into the correction problems requires a detailed knowledge of morphology, since it facilitates the creation of a more compact dictionary that has a manageable size.

FIGURE III.1. Alternatives for the word *caió.

Spell checkers have been available for more than 20 years, but some quite evident tasks of correction of words, even taken separately, have not been yet solved. To put a specific example, let us consider the ungrammatical string*teached in an English text. None of the spell checkers we have tried suggested the correct form taught. In an analogous way, if a foreigner inserts into a Spanish text such strings as *muestrar or *disponido, the Spanish spell checkers we have tried did not give the forms mostrar and dispuesto as possible corrections.


