5th International Conference on Spoken Language Processing
It is well known that good language models improve performance of speech recognition. One requirement for the estimation of language models is a sufficient amount of texts of the application domain. If not all words of the domain occur in the training texts for language models, a way must be found to model these words adequately. In this paper we report on a new approach of building word classes for language modeling in the bilingual (German, Italian) SpeeData project. The main idea is to classify words according to their morphological properties. Therefore we decompose words into their morphological units and put the words with the same prefix or suffix into the same class. Since morphological decomposition is error prone for unknown word stems, we also decomposed words by counting beginnings and endings of different length and used these subunits like prefixes and suffixes. The advantage of this approach is that it can be carried out automatically. We achieved a reduction in error rate from 9.83 % to 5.77 % for morphological decomposition and 5.99 % for automatical decomposition which can be performed without any morphological knowledge.
Bibliographic reference. Uebler, Ulla / Niemann, Heinrich (1998): "Morphological modeling of word classes for language models", In ICSLP-1998, paper 0338.