SLTU-2008 - First International Workshop on Spoken Languages Technologies for Under-Resourced Languages
Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task dependent text with the help of a machine-translated text corpus. Icelandic word error rate experiments were performed using data, machine translated (MT) from English to Icelandic on a sentenceby- sentence and word-by-word basis. The baseline word error rate was 49.6%. LM interpolation using the baseline LM and an LM built from sentence-by-sentence translated text reduced the word error rate significantly to 41.9%.
Index Terms LanguageModel Adaptation, Automatic Speech Recognition, Machine Translation, Sparse Text Corpus, Resource Deficient Languages.
Bibliographic reference. Jensson, Arnar / Iwano, Koji / Furui, Sadaoki (2008): "Development of a speech recognition system for Icelandic using machine translated text", In SLTU-2008, 18-21.