8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Improved Machine Translation of Speech-to-Text outputs

Daniel Déchelotte, Holger Schwenk, Gilles Adda, Jean-Luc Gauvain

LIMSI, France

Combining automatic speech recognition and machine translation is frequent in current research programs. This paper first presents several pre-processing steps to limit the performance degradation observed when translating an automatic transcription (as opposed to a manual transcription). Indeed, automatically transcribed speech often differs significantly from the machine translation system's training material, with respect to caseing, punctuation and word normalization. The proposed system outperforms the best system at the 2007 TC-STAR evaluation by almost 2 points BLEU. The paper then attempts to determine a criteria characterizing how well an STT system can be translated, but the current experiments could only confirm that lower word error rates lead to better translations.

Full Paper

Bibliographic reference.  Déchelotte, Daniel / Schwenk, Holger / Adda, Gilles / Gauvain, Jean-Luc (2007): "Improved machine translation of speech-to-text outputs", In INTERSPEECH-2007, 2441-2444.