7th International Conference on Spoken Language Processing
September 16-20, 2002
In this paper we study several possibilities to enhance language modeling in case of inflectional languages, namely Czech. We show that some existing smoothing techniques can be further improved to cope with extremely sparse data. We propose several concepts to combine word-based and class-based language models. In our approach the classes are defined with respect to morphological categories and their syntactic relations are evaluated through bigrams. In speech recognition experiments the combination of word bigrams with class statistics helped to get a moderate performance improvement.
Bibliographic reference. Nouza, Jan / Drabkova, Jindra (2002): "Combining lexical and morphological knowledge in language model for inflectional (czech) language", In ICSLP-2002, 705-708.