RELATED SYSTEMS

There are other types of applications that are not usually considered systems of computational linguistics proper, but rely heavily on linguistic methods to accomplish their tasks. Of these we will mention here two, both related to pattern recognition.

Optical character recognition systems recognize the graphemes, i.e., letters, numbers, and punctuation marks, in a point-by-point image of an arbitrary text printed on paper, and convert them to the corresponding ASCII codes. The graphemes can be in any font, typeface or size; the background of the paper can contain some dots and spots. An example of what the computer sees is given in Figure III.6.

A human being easily reads the following text Reglas de interpretación semántica: because he or she understands the meaning of the words. However, without understanding the meaning it is not possible to recognize, say, the first letter of the last string (is it a, s or g?), or the first letter(s) of the second line (is it r, i or m?).

FIGURE III.6. The image of a text, as the computer sees it.

FIGURE III.7. Several letters of the same text, as the computer sees them.

The first letters of the second line are shown separately in Figure III.7. One who does not know what the whole word means, cannot even say for sure that this picture represents any letters. However, one can easily read precisely the same image, shown above, of the same letters in their complete context. Hence, it is obvious that the task of optical character recognition cannot be solved only by the methods of image recognition, without linguistic information.

The image recognition proper is beyond the competence of computational linguistics. However, after the recognition of an image even of much higher quality than the one shown in Figure III.6, some peculiar errors can still appear in the textual representation of the image. They can be fixed by the operations similar to those of a spell checker. Such a specialized spell checker should know the most frequent errors of recognition. For example, the lower-case letter l is very similar to the figure 1, the letter n is frequently recognized as the pair ii, while the m can be recognized as iii or rn. Vice versa, the digraphs in, rn and ni are frequently recognized as m, and so forth.

Most such errors can be corrected without human intervention, on the base of linguistic knowledge. In the simplest case, such knowledge is just a dictionary of words existing in the language. However, in some cases a deeper linguistic analysis is necessary for disambiguation. For example, only full parsing of the context sentence can allow the program to decide whether the picture recognized as *danios actually represents the existing Spanish words darnos or damos.

A much more challenging task than recognition of printed texts is handwriting recognition. It is translation into ASCII form of the texts written by hand with a pen on paper or on the surface of a special computer device, or directly with a mouse on the computer screen. However, the main types of problem and the methods of solution for this task are nearly the same as for printed texts, at least in their linguistic aspect.

Speech recognition is another type of recognition task employing linguistic methods. A speech recognition system recognizes specific sounds in the flow of a speech of a human and then converts them into ASCII codes of the corresponding letters. The task of recognition itself belongs both to pattern recognition and tophonology, the science bordering on linguistics, acoustics, and physiology, which investigates the sounds used in speech.

The difficulties in the task of speech recognition are very similar or quite the same as in optical character recognition: mutilated patterns, fused patterns, disjoint parts of a pattern, lost parts of the pattern, noise superimposing the pattern. This leads to even a much larger number of incorrectly recognized letters than with optical character recognition, and application of linguistic methods, generally in the same manner, is even more important for this task.

Читайте також:

<== попередня сторінка	\|	наступна сторінка ==>
SYSTEMS OF LANGUAGE UNDERSTANDING	\|	CONCLUSIONS

Не знайшли потрібну інформацію? Скористайтесь пошуком google:

Генерація сторінки за: 0.014 сек.