EUROSPEECH 2003 - INTERSPEECH 2003
Building a stochastic language model (LM) for speech recognition requires a large corpus of target tasks. For some tasks no enough large corpus is available and this is an obstacle to achieving high recognition accuracy. In this paper, we propose a method for building an LM with a higher prediction power using large corpora from different tasks rather than an LM estimated from a small corpus for a specific target task. In our experiment, we used transcriptions of air university lectures and articles from Nikkei newspaper and compared an existing interpolation-based method and our new method. The results show that our new method reduces perplexity by 9.71%.
Bibliographic reference. Mori, Shinsuke / Nishimura, Masafumi / Itoh, Nobuyasu (2003): "Language model adaptation using word clustering", In EUROSPEECH-2003, 425-428.