International Workshop on Spoken Language Translation (IWSLT) 2012

Hong Kong
December 6-7, 2012

The KIT Translation Systems for IWSLT 2012

Mohammed Mediani, Yuqi Zhang, Thanh-Le Ha, Jan Niehues, Eunah Cho, Teresa Herrmann, Rainer Kärgel, Alex Waibel

Institute of Anthropomatics, KIT - Karlsruhe Institute of Technology, Karlsruhe, Germany

In this paper, we present the KIT systems participating in the English-French TED Translation tasks in the framework of the IWSLT 2012 machine translation evaluation. We also present several additional experiments on the English- German, English-Chinese and English-Arabic translation pairs.
    Our system is a phrase-based statistical machine translation system, extended with many additional models which were proven to enhance the translation quality. For instance, it uses the part-of-speech (POS)-based reordering, translation and language model adaptation, bilingual language model, word-cluster language model, discriminative word lexica (DWL), and continuous space language model.
    In addition to this, the system incorporates special steps in the preprocessing and in the post-processing step. In the preprocessing the noisy corpora are filtered by removing the noisy sentence pairs, whereas in the postprocessing the agreement between a noun and its surrounding words in the French translation is corrected based on POS tags with morphological information.
    Our system deals with speech transcription input by removing case information and punctuation except periods from the text translation model.

Full Paper    Presentation

Bibliographic reference.  Mediani, Mohammed / Zhang, Yuqi / Ha, Thanh-Le / Niehues, Jan / Cho, Eunah / Herrmann, Teresa / Kärgel, Rainer / Waibel, Alex (2012): "The KIT translation systems for IWSLT 2012", In IWSLT-2012, 38-45.