In this paper, we present the KIT systems participating in the English-French TED Translation tasks in the framework of the IWSLT 2012 machine translation evaluation. We also present several additional experiments on the English- German, English-Chinese and English-Arabic translation pairs. Our system is a phrase-based statistical machine translation system, extended with many additional models which were proven to enhance the translation quality. For instance, it uses the part-of-speech (POS)-based reordering, translation and language model adaptation, bilingual language model, word-cluster language model, discriminative word lexica (DWL), and continuous space language model. In addition to this, the system incorporates special steps in the preprocessing and in the post-processing step. In the preprocessing the noisy corpora are filtered by removing the noisy sentence pairs, whereas in the postprocessing the agreement between a noun and its surrounding words in the French translation is corrected based on POS tags with morphological information. Our system deals with speech transcription input by removing case information and punctuation except periods from the text translation model.
Cite as: Mediani, M., Zhang, Y., Ha, T.-L., Niehues, J., Cho, E., Herrmann, T., Kärgel, R., Waibel, A. (2012) The KIT translation systems for IWSLT 2012. Proc. International Workshop on Spoken Language Translation (IWSLT 2012), 38-45
@inproceedings{mediani12_iwslt, author={Mohammed Mediani and Yuqi Zhang and Thanh-Le Ha and Jan Niehues and Eunah Cho and Teresa Herrmann and Rainer Kärgel and Alex Waibel}, title={{The KIT translation systems for IWSLT 2012}}, year=2012, booktitle={Proc. International Workshop on Spoken Language Translation (IWSLT 2012)}, pages={38--45} }