International Workshop on Spoken Language Translation (IWSLT) 2012

Hong Kong
December 6-7, 2012

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System

Eunah Cho, Jan Niehues, Alex Waibel

International Center for Advanced Communication Technologies - InterACT, Institute of Anthropomatics, Karlsruhe Institute of Technology, Germany

In spoken language translation (SLT), finding proper segmentation and reconstructing punctuation marks are not only significant but also challenging tasks. In this paper we present our recent work on speech translation quality analysis for German-English by improving sentence segmentation and punctuation.
    From oracle experiments, we show an upper bound of translation quality if we had human-generated segmentation and punctuation on the output stream of speech recognition systems. In our oracle experiments we gain 1.78 BLEU points of improvements on the lecture test set. We build a monolingual translation system from German to German implementing segmentation and punctuation prediction as a machine translation task. Using the monolingual translation system we get an improvement of 1.53 BLEU points on the lecture test set, which is a comparable performance against the upper bound drawn by the oracle experiments.

Full Paper   

Bibliographic reference.  Cho, Eunah / Niehues, Jan / Waibel, Alex (2012): "Segmentation and punctuation prediction in speech language translation using a monolingual translation system", In IWSLT-2012, 252-259.