International Workshop on Spoken Language Translation (IWSLT) 2011

San Francisco, CA, USA
December 8-9, 2011

The NICT Translation System for IWSLT 2011

Andrew Finch (1), Chooi-Ling Goh (1), Graham Neubig (2), Eiichiro Sumita (1)

(1) Multilingual Translation Group, MASTAR Project, National Institute of Information and Communications Technology, Kyoto, Japan
(2) Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan

This paper describes NICT's participation in the IWSLT 2011 evaluation campaign for the TED speech translation Chinese- English shared-task. Our approach was based on a phrasebased statistical machine translation system that was augmented in two ways.
   Firstly we introduced rule-based re-ordering constraints on the decoding. This consisted of a set of rules that were used to segment the input utterances into segments that could be decoded almost independently. This idea here being that constraining the decoding process in this manner would greatly reduce the search space of the decoder, and cut out many possibilities for error while at the same time allowing for a correct output to be generated. The rules we used exploit punctuation and spacing in the input utterances, and we use these positions to delimit our segments. Not all punctuation/ spacing positions were used as segment boundaries, and the set of used positions were determined by a set of linguistically-based heuristics.
   Secondly we used two heterogeneous methods to build the translation model, and lexical reordering model for our systems. The first method employed the popular method of using GIZA++ for alignment in combination with phraseextraction heuristics. The second method used a recentlydeveloped Bayesian alignment technique that is able to perform both phrase-to-phrase alignment and phrase pair extraction within a single unsupervised process. The models produced by this type of alignment technique are typically very compact whilst at the same time maintaining a high level of translation quality. We evaluated both of these methods of translation model construction in isolation, and our results show their performance is comparable. We also integrated both models by linear interpolation to obtain a model that outperforms either component. Finally, we added an indicator feature into the log-linear model to indicate those phrases that were in the intersection of the two translation models. The addition of this feature was also able to provide a small improvement in performance.

Full Paper

Bibliographic reference.  Finch, Andrew / Goh, Chooi-Ling / Neubig, Graham / Sumita, Eiichiro (2011): "The NICT translation system for IWSLT 2011", In IWSLT-2011, 49-56.