International Workshop on Spoken Language Translation (IWSLT) 2012
In this paper, we present the KIT systems participating in
the English-French TED Translation tasks in the framework
of the IWSLT 2012 machine translation evaluation. We
also present several additional experiments on the English-
German, English-Chinese and English-Arabic translation
Our system is a phrase-based statistical machine translation system, extended with many additional models which were proven to enhance the translation quality. For instance, it uses the part-of-speech (POS)-based reordering, translation and language model adaptation, bilingual language model, word-cluster language model, discriminative word lexica (DWL), and continuous space language model.
In addition to this, the system incorporates special steps in the preprocessing and in the post-processing step. In the preprocessing the noisy corpora are filtered by removing the noisy sentence pairs, whereas in the postprocessing the agreement between a noun and its surrounding words in the French translation is corrected based on POS tags with morphological information.
Our system deals with speech transcription input by removing case information and punctuation except periods from the text translation model.
Full Paper Presentation
Bibliographic reference. Mediani, Mohammed / Zhang, Yuqi / Ha, Thanh-Le / Niehues, Jan / Cho, Eunah / Herrmann, Teresa / Kärgel, Rainer / Waibel, Alex (2012): "The KIT translation systems for IWSLT 2012", In IWSLT-2012, 38-45.