Lemmatization is the process of assigning the basic form of a word to its different inflected forms so that these can be recognized as representatives of the same word.

Many languages use inflection of words. For example, in English, the verb to work may appear as work, worked, works, working. In German, French, Russian and other languages there are typically even more forms (in German for work: arbeiten, arbeitete, arbeitetest, arbeiteten, arbeite, arbeitest, gearbeitet, etc.)  The basic form, work, is called the lemma of the word.

Lemmatization supports standardized access to texts, sentences and words and is very helpful to detect keywords in search processes and to configure intelligent interfaces to knowledge sources etc.

The Lingenio lemmatizer translates the sentences of a text into lists consisting of the basic forms of the words in the text. Alternatively it annotates the words by their basic form.


