Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces a technique to improve a LM built using a small amount of task dependent text with the help of a machine-translated text corpus. Perplexity experiments were performed using data, machine translated (MT) from English to French on a sentence-by-sentence basis and using dictionary lookup on a word-by-word basis. Then perplexity and word error rate experiments using MT data from English to Icelandic were done on a word-by-word basis. For the latter, the baseline word error rate was 44.0%. LM interpolation reduced word error rate significantly to 39.2%.
Cite as: Jensson, A.T., Whittaker, E.W.D., Iwano, K., Furui, S. (2005) Language model adaptation for resource deficient languages using translated data. Proc. Interspeech 2005, 1329-1332, doi: 10.21437/Interspeech.2005-29
@inproceedings{jensson05_interspeech, author={Arnar Thor Jensson and Edward W. D. Whittaker and Koji Iwano and Sadaoki Furui}, title={{Language model adaptation for resource deficient languages using translated data}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1329--1332}, doi={10.21437/Interspeech.2005-29} }