8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Language Model Adaptation Using Word Clustering

Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh

IBM Japan Ltd., Japan

Building a stochastic language model (LM) for speech recognition requires a large corpus of target tasks. For some tasks no enough large corpus is available and this is an obstacle to achieving high recognition accuracy. In this paper, we propose a method for building an LM with a higher prediction power using large corpora from different tasks rather than an LM estimated from a small corpus for a specific target task. In our experiment, we used transcriptions of air university lectures and articles from Nikkei newspaper and compared an existing interpolation-based method and our new method. The results show that our new method reduces perplexity by 9.71%.

Full Paper

Bibliographic reference.  Mori, Shinsuke / Nishimura, Masafumi / Itoh, Nobuyasu (2003): "Language model adaptation using word clustering", In EUROSPEECH-2003, 425-428.