ISCA Archive IWSLT 2005
ISCA Archive IWSLT 2005

N-gram-based versus phrase-based statistical machine translation

Josep M. Crego, Marta R. Costa-jussà, José B. Mariño, José A. R. Fonollosa

This work summarizes a comparison between two approaches to Statistical Machine Translation (SMT), namely Ngram-based and Phrase-based SMT. In both approaches, the translation process is based on bilingual units related by word-to-word alignments (pairs of source and target words), while the main differences are based on the extraction process of these units and the statistical modeling of the translation context. The study has been carried out on two different translation tasks (in terms of translation difficulty and amount of available training data), and allowing for distortion (reordering) in the decoding process. Thus it extends a previous work were both approaches were compared under monotone conditions. We finally report comparative results in terms of translation accuracy, computation time and memory size. Results show how the ngram-based approach outperforms the phrase-based approach by achieving similar accuracy scores in less computational time and with less memory needs.

Cite as: Crego, J.M., Costa-jussà, M.R., Mariño, J.B., Fonollosa, J.A.R. (2005) N-gram-based versus phrase-based statistical machine translation. Proc. International Workshop on Spoken Language Translation (IWSLT 2005), 167-174

  author={Josep M. Crego and Marta R. Costa-jussà and José B. Mariño and José A. R. Fonollosa},
  title={{N-gram-based versus phrase-based statistical machine translation}},
  booktitle={Proc. International Workshop on Spoken Language Translation (IWSLT 2005)},