This study was carried out in the scope of a cooperation between INFOVOX AB, Sweden, and Laboratoire "Parole et Langage" URA CNRS 261, France. Our purpose is to quantify how far the prosody of a Text-to-Speech (TTS) system is perceived from French prosodic structures. It is assumed that neutralising all the segmental information is an important methodological precaution allowing to determine whether it is legitimate to study prosodic parameters of TTS systems regardless to any segmental aspect of speech. In order to test the discriminatory power of merely pitch and pauses in the task of distinguishing between synthetic and natural speech, the spectral information of the original signal is reduced to a steady amplitude synthetic [a] which has the same length and Fo values as the original utterances. The evaluation shows that significantly different scores are assigned to natural and synthetic spectrum-reduced items. The identification of acceptable and faulty synthetic patterns produces a two mode distribution of scores. Yet the discriminatory power of prosodic features varies according to specific TTS applications.
Bibliographic reference. Nicolas, Pascale / Romeas, Pascal (1993): "Evaluation of prosody in the French version of multilingual text-to-speech synthesis: neutralising segmental information in preliminary tests", In EUROSPEECH'93, 211-214.