One use of text-to-speech synthesis (TTS) is as a component of speechto- speech translation systems. The output of automatic machine translation (MT) can vary widely in quality, however. A synthetic voice that is extremely intelligible on naturally-occurring text may be far less intelligible when asked to render text that is automatically generated. In this paper, we compare the quality of synthesis of naturally-occurring text and its MT counterpart. We find that intelligibility of TTS on MT output is significantly lower than on either naturally-occurring text or semantically unpredictable sentences, and explore the reasons why.
Cite as: Tomokiyo, L.M., Peterson, K., Black, A.W., Lenzo, K.A. (2006) Intelligibility of machine translation output in speech synthesis. Proc. Interspeech 2006, paper 1268-Thu2A3O.2, doi: 10.21437/Interspeech.2006-610
@inproceedings{tomokiyo06_interspeech, author={Laura Mayfield Tomokiyo and Kay Peterson and Alan W. Black and Kevin A. Lenzo}, title={{Intelligibility of machine translation output in speech synthesis}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1268-Thu2A3O.2}, doi={10.21437/Interspeech.2006-610} }