International Workshop on Spoken Language Translation (IWSLT) 2008

Honolulu, Hawaii, USA
October 20-21, 2008

The TALP&I2R SMT Systems for IWSLT 2008

Maxim Khalilov (1), Marta R. Costa-jussà (1), Carlos A. Henríquez (1), José A. R. Fonollosa (1), Adolfo Hernández (1), José B. Mariño (1), Rafael E. Banchs (1), Chen Boxing (2), Min Zhang (2), Aiti Aw (2), Haizhou Li (2)

(1) TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain
(2) Department of Human Language Technology, Institute for Infocomm Research, Singapore

This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Politècnica de Catalunya) for our participation in the IWSLT'08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems' architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system.
   This year we focus on the Arabic-English, Chinese- Spanish and pivot Chinese-(English)-Spanish translation tasks.

Full Paper     Presentation (pdf)

Bibliographic reference.  Khalilov, Maxim / Costa-jussà, Marta R. / Henríquez, Carlos A. / Fonollosa, José A. R. / Hernández, Adolfo / Mariño, José B. / Banchs, Rafael E. / Boxing, Chen / Zhang, Min / Aw, Aiti / Li, Haizhou (2008): "The TALP&I2r SMT systems for IWSLT 2008", In IWSLT-2008, 116-123.