International Workshop on Spoken Language Translation (IWSLT) 2012
This paper describes NTCT's participation in the IWSLT 2012 evaluation campaign forthe TED speech translation Russian-English shared-task. Our approach was based on a phrase-based statistical machine translation system that was augmented by using transliteration mining techniques. The basic premise behind our approach was to try to use sub-word-level alignments to guide the word-level alignment process used to learn the phrase-table. We did this by first mining a corpus of Russian-English transliterations pairs and cognates from a set of interlanguage link titles from Wikipedia. This corpus was then used to build a many-to-many nonparametric Dayesian bilingual alignment model that could be used to identify the occurrence of transliterations and cognates in the training corpus itself. Alignment counts for these mined pairs were increased in the training corpus to increase the likelihood that these pairs would align in training. Our experiments on the test sets from the 2010 and 2011 shared tasks, showed that an improvement in BLEU score can be gained in translation performance by encouraging the alignment of cognates and transliterations during word alignment.
Bibliographic reference. Finch, Andrew / Htun, Ohnmar / Sumita, Eiichiro (2012): "The NICT translation system for IWSLT 2012", In IWSLT-2012, 121-125.