Survey Talk: A Survey on Speech Translation

Jan Niehues

We will start with an overview on the different use cases and difficulties of speech translation. Due to the wide range of possible application these systems differ in data, difficulty of the language and spontaneous effects. Furthermore, the interaction with human has an important influence. In the main part of the talk, we will review state-of-the-art methods to build speech translation system. We will start with reviewing the translation approach of spoken language translation, a cascade of an automatic speech recognition system and a machine translation system. We will highlight the challenges when combining both systems. Especially, techniques to adapt the system to scenario will be reviewed. With the success of neural models in both areas, we see a rising research interest in end-to-end speech translation. While we see promising results on this approach, international evaluation campaigns like the Shared Task of the International Workshop on Spoken Language Translation (IWSLT) have shown that currently often cascaded systems still achieve a better translation performance. We will highlight the main challenges of end-to-end speech translation. In the final part of the talk, we will review techniques that address key challenges of speech translation, e.g. Latency, spontaneous effects, sentence segmentation and stream decoding.

Cite as: Niehues, J. (2019) Survey Talk: A Survey on Speech Translation. Proc. Interspeech 2019.

