International Workshop on Spoken Language Translation (IWSLT) 2006

Keihanna Science City, Kyoto, Japan
November 27-28, 2006

The TALP N-Gram-based SMT System for IWSLT 2006

Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, Marta R. Costa-jussà, José B. Mariño, Rafael E. Banchs, José A. R. Fonollosa

TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain

This paper describes TALPtuples, the 2006 Ngram-based statistical machine translation system developed at the TALP Research Center of the UPC (Universitat Polit&# 30;ecnica de Catalunya) in Barcelona. Emphasis is put on improvements and extensions of the system of previous years, being highlighted and empirically compared. Mainly, these include a novel and much more ef&# 2;cient word ordering strategy based on reordering patterns, a linguistically-guided tuple segmentation criterion and improved optimization procedures.
   The paper provides details of this system participation in the third InternationalWorkshop on Spoken Language Translation (IWSLT) held in Kyoto, Japan in November 2006. Results on four translation directions are reported, namely from Arabic, Chinese, Italian and Japanese into English for the open data track, thoroughly explaining all language-related preprocessing and optimization schemes.

Full Paper     Presentation

Bibliographic reference.  Crego, Josep M. / Gispert, Adrià de / Lambert, Patrik / Khalilov, Maxim / Costa-jussà, Marta R. / Mariño, José B. / Banchs, Rafael E. / Fonollosa, José A. R. (2006): "The TALP n-gram-based SMT system for IWSLT 2006", In IWSLT-2006, 116-122.