7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Combining Lexical and Morphological Knowledge in Language Model for Inflectional (Czech) Language

Jan Nouza, Jindra Drabkova

Technical University of Liberec, Czech Republic

In this paper we study several possibilities to enhance language modeling in case of inflectional languages, namely Czech. We show that some existing smoothing techniques can be further improved to cope with extremely sparse data. We propose several concepts to combine word-based and class-based language models. In our approach the classes are defined with respect to morphological categories and their syntactic relations are evaluated through bigrams. In speech recognition experiments the combination of word bigrams with class statistics helped to get a moderate performance improvement.

Full Paper

Bibliographic reference.  Nouza, Jan / Drabkova, Jindra (2002): "Combining lexical and morphological knowledge in language model for inflectional (czech) language", In ICSLP-2002, 705-708.