NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation

Eunah Cho, Jan Niehues, Alex Waibel


Insertion of proper segmentation and punctuation into an ASR transcript is crucial not only for the performance of subsequent applications but also for the readability of the text. In a simultaneous spoken language translation system, the segmentation model has to fulfill real-time constraints and minimize latency as well.

In this paper, we show the successful integration of an attentional encoder-decoder-based segmentation and punctuation insertion model into a real-time spoken language translation system. The proposed technique can be easily integrated into the real-time framework and improve the punctuation performance on reference transcripts as well as on ASR outputs. Compared to the conventional language model and prosody-based model, our experiments on end-to-end spoken language translation show that translation performance is improved by 1.3 BLEU points by adopting the NMT-based punctuation model, maintaining low-latency.


 DOI: 10.21437/Interspeech.2017-1320

Cite as: Cho, E., Niehues, J., Waibel, A. (2017) NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation. Proc. Interspeech 2017, 2645-2649, DOI: 10.21437/Interspeech.2017-1320.


@inproceedings{Cho2017,
  author={Eunah Cho and Jan Niehues and Alex Waibel},
  title={NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2645--2649},
  doi={10.21437/Interspeech.2017-1320},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1320}
}