ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Stream decoding for simultaneous spoken language translation

Muntsin Kolss, Stephan Vogel, Alex Waibel

In the typical speech translation system, the first-best speech recognizer hypothesis is segmented into sentence-like units which are then fed to the downstream machine translation component. The need for a sufficiently large context in this intermediate step and for the MT introduces delays which are undesirable in many application scenarios, such as real-time subtitling of foreign language broadcasts or simultaneous translation of speeches and lectures.

In this paper, we propose a statistical machine translation decoder which processes a continuous input stream, such as that produced by a run-on speech recognizer. By decoupling decisions about the timing of translation output generation from any fixed input segmentation, this design can guarantee a maximum output lag for each input word while allowing for full word reordering within this time window.

Experimental results show that this system achieves competitive translation performance with a minimum of translation-induced latency.

doi: 10.21437/Interspeech.2008-678

Cite as: Kolss, M., Vogel, S., Waibel, A. (2008) Stream decoding for simultaneous spoken language translation. Proc. Interspeech 2008, 2735-2738, doi: 10.21437/Interspeech.2008-678

  author={Muntsin Kolss and Stephan Vogel and Alex Waibel},
  title={{Stream decoding for simultaneous spoken language translation}},
  booktitle={Proc. Interspeech 2008},