International Workshop on Spoken Language Translation (IWSLT) 2004

Keihanna Science City, Kyoto, Japan
September 30-October 1, 2004

Statistical Machine Translation of Spontaneous Speech with Scarce Resources

Evgeny Matusov, Maja Popović, Richard Zens, Hermann Ney

Lehrstuhl für Informatik VI - Computer Science Department, RWTH Aachen University, Aachen, Germany

This paper deals with the task of statistical machine translation of spontaneous speech using a limited amount of training data. We propose a method for selecting relevant additional training data from other sources that may come from other domains. We present two ways to solve the data sparseness problem by including morphological information into the EM training of word alignments. We show that the use of part-of-speech information for harmonizing word order between source and target sentences yields significant improvements in the BLEU score.

Full Paper    Presentation

Bibliographic reference.  Matusov, Evgeny / Popović, Maja / Zens, Richard / Ney, Hermann (2004): "Statistical machine translation of spontaneous speech with scarce resources", In IWSLT-2004, 139-146.