Insertion of proper segmentation and punctuation into an ASR transcript
is crucial not only for the performance of subsequent applications
but also for the readability of the text. In a simultaneous spoken
language translation system, the segmentation model has to fulfill
real-time constraints and minimize latency as well.
In this paper, we
show the successful integration of an attentional encoder-decoder-based
segmentation and punctuation insertion model into a real-time spoken
language translation system. The proposed technique can be easily integrated
into the real-time framework and improve the punctuation performance
on reference transcripts as well as on ASR outputs. Compared to the
conventional language model and prosody-based model, our experiments
on end-to-end spoken language translation show that translation performance
is improved by 1.3 BLEU points by adopting the NMT-based punctuation
model, maintaining low-latency.
Cite as: Cho, E., Niehues, J., Waibel, A. (2017) NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation. Proc. Interspeech 2017, 2645-2649, doi: 10.21437/Interspeech.2017-1320
@inproceedings{cho17_interspeech, author={Eunah Cho and Jan Niehues and Alex Waibel}, title={{NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2645--2649}, doi={10.21437/Interspeech.2017-1320} }