International Workshop on Spoken Language Translation (IWSLT) 2004
Keihanna Science City, Kyoto, Japan
This paper deals with the task of statistical machine translation of spontaneous speech using a limited amount of training data. We propose a method for selecting relevant additional training data from other sources that may come from other domains. We present two ways to solve the data sparseness problem by including morphological information into the EM training of word alignments. We show that the use of part-of-speech information for harmonizing word order between source and target sentences yields significant improvements in the BLEU score.
Full Paper Presentation
Bibliographic reference. Matusov, Evgeny / Popović, Maja / Zens, Richard / Ney, Hermann (2004): "Statistical machine translation of spontaneous speech with scarce resources", In IWSLT-2004, 139-146.