ISCA Archive IWSLT 2010
ISCA Archive IWSLT 2010

CCG augmented hierarchical phrase-based machine translation

Hala Almaghout, Jie Jiang, Andy Way

We present a method to incorporate target-language syntax in the form of Combinatory Categorial Grammar in the Hierarchical Phrase-Based MT system. We adopt the approach followed by Syntax Augmented Machine Translation (SAMT) to attach syntactic categories to nonterminals in hierarchical rules, but instead of using constituent grammar, we take advantage of the rich syntactic information and flexible structures of Combinatory Categorial Grammar. We present results on Chinese-English DIALOG IWSLT data and compare them with Moses SAMT4 and Moses Phrase-Based systems. Our results show 5.47% and 1.18% BLEU score relative increase over Moses SAMT4 and Phrase-Based systems, respectively. We conduct analysis on the reasons behind this improvement and we find out that our approach has better coverage than SAMT approach. Furthermore, Combinatory Categorial Grammar-based syntactic categories attached to nonterminals in hierarchical rules prove to be less sparse and can generalize better than syntactic categories extracted according to SAMT method.

Cite as: Almaghout, H., Jiang, J., Way, A. (2010) CCG augmented hierarchical phrase-based machine translation. Proc. International Workshop on Spoken Language Translation (IWSLT 2010), 211-218

  author={Hala Almaghout and Jie Jiang and Andy Way},
  title={{CCG augmented hierarchical phrase-based machine translation}},
  booktitle={Proc. International Workshop on Spoken Language Translation (IWSLT 2010)},