ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Language model adaptation for resource deficient languages using translated data

Arnar Thor Jensson, Edward W. D. Whittaker, Koji Iwano, Sadaoki Furui

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces a technique to improve a LM built using a small amount of task dependent text with the help of a machine-translated text corpus. Perplexity experiments were performed using data, machine translated (MT) from English to French on a sentence-by-sentence basis and using dictionary lookup on a word-by-word basis. Then perplexity and word error rate experiments using MT data from English to Icelandic were done on a word-by-word basis. For the latter, the baseline word error rate was 44.0%. LM interpolation reduced word error rate significantly to 39.2%.


doi: 10.21437/Interspeech.2005-29

Cite as: Jensson, A.T., Whittaker, E.W.D., Iwano, K., Furui, S. (2005) Language model adaptation for resource deficient languages using translated data. Proc. Interspeech 2005, 1329-1332, doi: 10.21437/Interspeech.2005-29

@inproceedings{jensson05_interspeech,
  author={Arnar Thor Jensson and Edward W. D. Whittaker and Koji Iwano and Sadaoki Furui},
  title={{Language model adaptation for resource deficient languages using translated data}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={1329--1332},
  doi={10.21437/Interspeech.2005-29}
}