International Workshop on Spoken Language Translation (IWSLT) 2009

Tokyo, Japan
December 1-2, 2009

The CASIA Statistical Machine Translation System for IWSLT 2009

Maoxi Li, Jiajun Zhang, Yu Zhou, Chengqing Zong

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

This paper reports on the participation of CASIA (Institute of Automation Chinese Academy of Sciences) at the evaluation campaign of the International Workshop on Spoken Language Translation 2009. We participated in the challenge tasks for Chinese-to- English and English-to-Chinese translation respectively and the BTEC task for Chinese-to-English translation only. For all of the tasks, system performance is improved with some special methods as follows: 1) combining different results of Chinese word segmentation, 2) combining different results of word alignments, 3) adding reliable bilingual words with high probabilities to the training data, 4) handling named entities including person names, location names, organization names, temporal and numerical expressions additionally, 5) combining and selecting translations from the outputs of multiple translation engines, 6) replacing Chinese character with Chinese Pinyin to train the translation model for Chinese-to- English ASR challenge task. This is a new approach that has never been introduced before.

Full Paper     Presentation (pdf)

Bibliographic reference.  Li, Maoxi / Zhang, Jiajun / Zhou, Yu / Zong, Chengqing (2009): "The CASIA statistical machine translation system for IWSLT 2009", In IWSLT-2009, 83-90.