Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Looking for Topic Similarities of Highly Inflected Languages for Language Model Adaptation

Mirjam Sepesy Maucec, Zdravko Kacic, Bogomir Horvat

University of Maribor, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia

In this paper, we propose a new framework to construct corpus-based topic-sensitive language models of highly inflected languages for large vocabulary speech recognition. We concentrate on feature extraction process devoted to languages where words are formed by many different inflectional affixatations. In our approach all words with the same meaning but different grammatical form are collected in one cluster automatically by using fuzzy comparison function. Using topic classifier sub-corpus of a large collection of training text is selected. Language models are built by interpolation of topic specific models and general model. Results of experiments on English and Slovenian corpus are reported.

Full Paper

Bibliographic reference.  Maucec, Mirjam Sepesy / Kacic, Zdravko / Horvat, Bogomir (2000): "Looking for topic similarities of highly inflected languages for language model adaptation", In ICSLP-2000, vol.2, 891-894.