International Workshop on Spoken Language Translation (IWSLT) 2006

Keihanna Science City, Kyoto, Japan
November 27-28, 2006

Toward Integrating Word Sense and Entity Disambiguation into Statistical Machine Translation

Marine Carpuat, Yihai Shen, Xiaofeng Yu, Dekai Wu

Human Language Technology Center (HKUST), Department of Computer Science, University of Science and Technology, Hong Kong

We describe a machine translation approach being designed at HKUST to integrate semantic processing into statistical machine translation, beginning with entity and word sense disambiguation. We show how integrating the semantic modules consistently improves translation quality across several data sets. We report results on five different IWSLT 2006 speech translation tasks, representing HKUST's first participation in the IWSLT spoken language translation evaluation campaign. We translated both read and spontaneous speech transcriptions from Chinese to English, achieving reasonable performance despite the fact that our system is essentially text-based and therefore not designed and tuned to tackle the challenges of speech translation. We also find that the system achieves reasonable results on a wide range of languages, by evaluating on read speech transcriptions from Arabic, Italian, and Japanese into English.

Full Paper    

Bibliographic reference.  Carpuat, Marine / Shen, Yihai / Yu, Xiaofeng / Wu, Dekai (2006): "Toward integrating word sense and entity disambiguation into statistical machine translation", In IWSLT-2006, 37-44.