ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Towards Simultaneous Machine Interpretation

Alejandro Pérez-González-de-Martos, Javier Iranzo-Sánchez, Adrià Giménez Pastor, Javier Jorge, Joan-Albert Silvestre-Cerdà, Jorge Civera, Albert Sanchis, Alfons Juan

Automatic speech-to-speech translation (S2S) is one of the most challenging speech and language processing tasks, especially when considering its application to real-time settings. Recent advances on streaming Automatic Speech Recognition (ASR), simultaneous Machine Translation (MT) and incremental neural Text-To-Speech (TTS) make it possible to develop real-time cascade S2S systems with greatly improved accuracy. On the way to simultaneous machine interpretation, a state-of-the-art cascade streaming S2S system is described and empirically assessed in the simultaneous interpretation of European Parliament debates. We pay particular attention to the TTS component, particularly in terms of speech naturalness under a variety of response-time settings, as well as in terms of speaker similarity for its cross-lingual voice cloning capabilities.


doi: 10.21437/Interspeech.2021-201

Cite as: Pérez-González-de-Martos, A., Iranzo-Sánchez, J., Pastor, A.G., Jorge, J., Silvestre-Cerdà, J.-A., Civera, J., Sanchis, A., Juan, A. (2021) Towards Simultaneous Machine Interpretation. Proc. Interspeech 2021, 2277-2281, doi: 10.21437/Interspeech.2021-201

@inproceedings{perezgonzalezdemartos21_interspeech,
  author={Alejandro Pérez-González-de-Martos and Javier Iranzo-Sánchez and Adrià Giménez Pastor and Javier Jorge and Joan-Albert Silvestre-Cerdà and Jorge Civera and Albert Sanchis and Alfons Juan},
  title={{Towards Simultaneous Machine Interpretation}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2277--2281},
  doi={10.21437/Interspeech.2021-201}
}