We present the CMU Syntax Augmented Machine Translation System that was used in the IWSLT-08 evaluation campaign. We participated in the Full-BTEC data track for Chinese-English translation, focusing on transcript translation. For this year's evaluation, we ported the Syntax Augmented MT toolkit [1] to the Hadoop MapReduce [2] parallel processing architecture, allowing us to efficiently run experiments evaluating a novel “wider pipelines” approach to integrate evidence from N-best alignments into our translation models. We describe each step of the MapReduce pipeline as it is implemented in the open-source SAMT toolkit, and show improvements in translation quality by using N-best alignments in both hierarchical and syntax augmented translation systems.
Cite as: Zollmann, A., Venugopal, A., Vogel, S. (2008) The CMU syntax-augmented machine translation system: SAMT on Hadoop with n-best alignments. Proc. International Workshop on Spoken Language Translation (IWSLT 2008), 18-25
@inproceedings{zollmann08_iwslt, author={Andreas Zollmann and Ashish Venugopal and Stephan Vogel}, title={{The CMU syntax-augmented machine translation system: SAMT on Hadoop with n-best alignments}}, year=2008, booktitle={Proc. International Workshop on Spoken Language Translation (IWSLT 2008)}, pages={18--25} }