Real-time speech-to-speech (S2S) translation of lectures and speeches require simultaneous translation with low latency to continually engage the listeners. However, simultaneous speech-tospeech translation systems have been predominantly repurposing translation models that are typically trained for consecutive translation without a motivated attempt to model incrementality. Furthermore, the notion of translation is simplified to translation plus simultaneity. In contrast, human interpreters are able to perform simultaneous interpretation by generating target speech incrementally with very low ear-voice span by using a variety of strategies such as compression (paraphrasing), incremental comprehension, and anticipation through discourse inference and expectation of discourse redundancies. Exploiting and modeling such phenomena can potentially improve automatic real-time translation of speech. As a first step, in this work we identify and present a systematic analysis of phenomena used by human interpreters to perform simultaneous interpretation and elucidate how it can be exploited in a conventional simultaneous translation framework. We perform our study on a corpus of simultaneous interpretation of Parliamentary speeches in English and Spanish. Specifically, we present an empirical analysis of factors such as time constraint, redundancy and inference as evidenced in the simultaneous interpretation corpus.
Bibliographic reference. Sridhar, Vivek Kumar Rangarajan / Chen, John / Bangalore, Srinivas (2013): "Corpus analysis of simultaneous interpretation data for improving real time speech translation", In INTERSPEECH-2013, 3468-3472.