SLTU-2008 - First International Workshop on Spoken Languages Technologies for Under-Resourced Languages

Hanoi, Vietnam
May 5-7, 2008

Development of a Speech Recognition System for Icelandic Using Machine Translated Text

Arnar Jensson, Koji Iwano, Sadaoki Furui

Tokyo Institute of Technology, Japan

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task dependent text with the help of a machine-translated text corpus. Icelandic word error rate experiments were performed using data, machine translated (MT) from English to Icelandic on a sentenceby- sentence and word-by-word basis. The baseline word error rate was 49.6%. LM interpolation using the baseline LM and an LM built from sentence-by-sentence translated text reduced the word error rate significantly to 41.9%.

Index Terms— LanguageModel Adaptation, Automatic Speech Recognition, Machine Translation, Sparse Text Corpus, Resource Deficient Languages.

Full Paper

Bibliographic reference.  Jensson, Arnar / Iwano, Koji / Furui, Sadaoki (2008): "Development of a speech recognition system for Icelandic using machine translated text", In SLTU-2008, 18-21.