ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Looking for topic similarities of highly inflected languages for language model adaptation

Mirjam Sepesy Maucec, Zdravko Kacic, Bogomir Horvat

In this paper, we propose a new framework to construct corpus-based topic-sensitive language models of highly inflected languages for large vocabulary speech recognition. We concentrate on feature extraction process devoted to languages where words are formed by many different inflectional affixatations. In our approach all words with the same meaning but different grammatical form are collected in one cluster automatically by using fuzzy comparison function. Using topic classifier sub-corpus of a large collection of training text is selected. Language models are built by interpolation of topic specific models and general model. Results of experiments on English and Slovenian corpus are reported.


Cite as: Maucec, M.S., Kacic, Z., Horvat, B. (2000) Looking for topic similarities of highly inflected languages for language model adaptation. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 891-894

@inproceedings{maucec00_icslp,
  author={Mirjam Sepesy Maucec and Zdravko Kacic and Bogomir Horvat},
  title={{Looking for topic similarities of highly inflected languages for language model adaptation}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 2, 891-894}
}