In this paper we describe the CMU statistical machine translation system used in the IWSLT 2005 evaluation campaign. This system is based on phrase-to-phrase translations extracted from a bilingual corpus. We experimented with two different phrase extraction methods; PESA on-the-fly phrase extraction and alignment free extraction method. The translation model, language model and other features were combined in a log-linear model during decoding. We present our experiments on model adaptation for new data in a different domain, as well as combining different translation hypotheses to obtain better translations. We participated in the supplied data track for manual transcriptions in the translation directions: Arabic- English, Chinese-English, Japanese-English and Korean- English. For Chinese-English direction we also worked on ASR output of the supplied data, and with additional data in unrestricted and C-STAR tracks.
Cite as: Hewavitharana, S., Zhao, B., Hildebrand, A.S., Eck, M., Hori, C., Vogel, S., Waibel, A. (2005) The CMU statistical machine translation system for IWSLT 2005. Proc. International Workshop on Spoken Language Translation (IWSLT 2005), 53-60
@inproceedings{hewavitharana05_iwslt, author={Sanjika Hewavitharana and Bing Zhao and Almut Silja Hildebrand and Matthias Eck and Chiori Hori and Stephan Vogel and Alex Waibel}, title={{The CMU statistical machine translation system for IWSLT 2005}}, year=2005, booktitle={Proc. International Workshop on Spoken Language Translation (IWSLT 2005)}, pages={53--60} }