16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

“Speech is Silver, but Silence is Golden”: Improving Speech-to-Speech Translation Performance by Slashing Users Input

Frederic Bechet, Benoit Favre, Mickael Rouvier

LIF (UMR 7279), France

Speech-to-speech translation is a challenging task mixing two of the most ambitious Natural Language Processing challenges: Machine Translation (MT) and Automatic Speech Recognition (ASR). Recent advances in both fields have led to operational systems achieving good performance when used in matching conditions with those of ASR and MT models training. Regardless of the quality of these models, errors are inevitable due to some technical limitations of the systems (e.g. closed vocabulary) and intrinsic ambiguities of spoken languages. However all ASR and MT errors don't have the same impact on the usability of a given speech-to-speech dialog system: some can be very benign, unconsciously corrected by users, some can damage the understanding between users and eventually lead the dialog to a failure. We present in this paper a strategy focusing on ASR error segments that have a high negative impact on MT performance. We propose a method that consists firstly in automatically detecting these erroneous segments then secondly estimating their impact on MT. We show that removing such segments prior to translation can lead to a significant decrease in translation error rate, even without any correction strategy.

Full Paper

Bibliographic reference.  Bechet, Frederic / Favre, Benoit / Rouvier, Mickael (2015): "“speech is silver, but silence is golden”: improving speech-to-speech translation performance by slashing users input", In INTERSPEECH-2015, 2252-2256.