Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Looking for Topic Similarities of Highly Inflected Languages for Language Model Adaptation
Mirjam Sepesy Maucec, Zdravko Kacic, Bogomir Horvat
University of Maribor, Faculty of Electrical Engineering and Computer Science,
In this paper, we propose a new framework to construct
corpus-based topic-sensitive language models of highly
inflected languages for large vocabulary speech recognition.
We concentrate on feature extraction process devoted to
languages where words are formed by many different inflectional
affixatations. In our approach all words with
the same meaning but different grammatical form are collected
in one cluster automatically by using fuzzy comparison
function. Using topic classifier sub-corpus of a large
collection of training text is selected. Language models are
built by interpolation of topic specific models and general
model. Results of experiments on English and Slovenian
corpus are reported.
Maucec, Mirjam Sepesy / Kacic, Zdravko / Horvat, Bogomir (2000):
"Looking for topic similarities of highly inflected languages for language model adaptation",
In ICSLP-2000, vol.2, 891-894.