WORD, WHAT IS IT?

As it could be noticed, the term word was used in the previous sections very loosely. Its meaning seems obvious: any language operates with words and any text or utterance consists of them. This notion seems so simple that, at the first glance, it does not require any strict definition or further explanation: one can think that a word is just a substring of the text as a letter string, from the first delimiter (usually, a space) to the next one (usually, a space or a punctuation mark). Nevertheless, the situation is not so simple.

Let us consider the Spanish sentence Yo devuelvo los libros el próximo mes, pero tú me devuelves el libro ahora.How many words does it contain? One can say 14 and will be right, since there are just 14 letter substrings from one delimiter to another in this sentence. One can also notice that the article el is repeated twice, so that the number of different words (substrings) is 13. For these observations, no linguistic knowledge is necessary.

However, one can also notice that devuelvo and devuelves are forms of the same verb devolver, and libros and libro are forms of the same noun libro, so that the number of different words is only 11. Indeed, these pairs of wordforms denote the same action or thing. If one additionally notices that the article los is essentially equivalent to the article el whereas the difference in grammatical number is ignored, then there are only 10 different words in this sentence. In all these cases, the “equivalent” strings are to some degree similar in their appearance, i.e., they have some letters in common.

At last, one can consider me the same as yo, but given in oblique grammatical case, even though there are no letters in common in these substrings. For such an approach, the total number of different words is nine.

We can conclude from the example that the term word is too ambiguous to be used in a science with the objective to give a precise description of a natural language. To introduce a more consistent terminology, let us call an individual substring used in a specific place of a text (without taking into account its possible direct repetitions or similarities to other substrings) a word occurrence. Now we can say that the sentence above consisted of 14 word occurrences.

Some of the substrings (usually similar in the appearance) have the same core meaning. We intuitively consider them as different forms of some common entity. A set of such forms is called lexeme. For example, in Spanish {libro, libros}, {alto, alta, altos, altas}, and {devolver, devuelvo, devuelves, devuelve, devolvemos...} are lexemes. Indeed, in each set there is a commonality between the strings in the letters they consist of (the commonality being expressed as patterns libro‑, alt‑, and dev...lv‑), and their meanings are equivalent (namely, ‘book’, ‘high’, and ‘to bring back’, correspondingly). Each entry of such a set—a letter string without regard to its position in the text—is called wordform. Each word occurrence represents a wordform, while wordforms (but not word occurrences) can repeat in the text. Now we can say that the sentence in the example above contains 14 word occurrences, 13 different wordforms, or nine different lexemes. The considerations that gave other figures in the example above are linguistically inconsistent.

A lexeme is identified by a name. Usually, one of its wordforms, i.e., a specific member of the wordform set, is selected for this purpose. In the previous examples, LIBRO, ALTO, and DEVOLVER are taken as names of the corresponding lexemes. Just these names are used as titles of the corresponding entries in dictionaries mentioned above. The dictionaries cover available information about lexemes of a given language, sometimes including morphologic information, i.e., the information on how wordforms of these lexemes are constructed. Various dictionaries compiled for the needs of lexicography, dialectology, and sociolinguistics have just lexemes as their entries rather than wordforms.

Therefore, the term word, as well as its counterparts in other languages, such as Spanish palabra, is too ambiguous to be used in a linguistic book. Instead, we should generally use the terms word occurrence for a specific string in a specific place in the text, wordform for a string regardless to its specific place in any text, and lexeme for a theoretical construction uniting several wordforms corresponding to a common meaning in the manner discussed above.

However, sometimes we will retain the habitual word word when it is obvious which of these more specific terms is actually meant.

Читайте також:

Текстовий процесор MS WORD, графічний редактор Paint.

<== попередня сторінка	\|	наступна сторінка ==>
WHAT WE MEAN BY COMPUTATIONAL LINGUISTICS	\|	THE IMPORTANT ROLE OF THE FUNDAMENTAL SCIENCE

Не знайшли потрібну інформацію? Скористайтесь пошуком google:

Генерація сторінки за: 0.015 сек.