Interspeech'2005 - Eurospeech
Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces a technique to improve a LM built using a small amount of task dependent text with the help of a machine-translated text corpus. Perplexity experiments were performed using data, machine translated (MT) from English to French on a sentence-by-sentence basis and using dictionary lookup on a word-by-word basis. Then perplexity and word error rate experiments using MT data from English to Icelandic were done on a word-by-word basis. For the latter, the baseline word error rate was 44.0%. LM interpolation reduced word error rate significantly to 39.2%.
Bibliographic reference. Jensson, Arnar Thor / Whittaker, Edward W. D. / Iwano, Koji / Furui, Sadaoki (2005): "Language model adaptation for resource deficient languages using translated data", In INTERSPEECH-2005, 1329-1332.