International Workshop on Spoken Language Translation (IWSLT) 2005

Pittsburgh, PA, USA
October 24-25, 2005

N-Gram-based versus Phrase-based Statistical Machine Translation

Josep M. Crego, Marta R. Costa-jussà, José B. Mariño, José A. R. Fonollosa

TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain

This work summarizes a comparison between two approaches to Statistical Machine Translation (SMT), namely Ngram-based and Phrase-based SMT. In both approaches, the translation process is based on bilingual units related by word-to-word alignments (pairs of source and target words), while the main differences are based on the extraction process of these units and the statistical modeling of the translation context. The study has been carried out on two different translation tasks (in terms of translation difficulty and amount of available training data), and allowing for distortion (reordering) in the decoding process. Thus it extends a previous work were both approaches were compared under monotone conditions. We finally report comparative results in terms of translation accuracy, computation time and memory size. Results show how the ngram-based approach outperforms the phrase-based approach by achieving similar accuracy scores in less computational time and with less memory needs.

Full Paper    Presentation

Bibliographic reference.  Crego, Josep M. / Costa-jussà, Marta R. / Mariño, José B. / Fonollosa, José A. R. (2005): "N-gram-based versus phrase-based statistical machine translation", In IWSLT-2005, 167-174.