Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

Exploring the Naturalness of Several German High-Quality-Text-to-Speech Systems

Hansjörg Mixdorff, Dieter Mehnert

Dresden University of Technology, Germany

The synthesis of near-to-natural F0 contours is an important issue in text-to-speech and crucial to the naturalness and intelligibility of synthetic speech. In earlier studies of the first author a model of German intonation was developed that is based on the quantitative Fujisaki-model. The current paper addresses a perception experiment comparing a TTS-system incorporating this new approach with several German TTS-systems with high segmental quality. Natural speech samples and a synthesis version with natural segment durations were used as references. Results show, that the natural speech samples unanimously received 10 points on a 0 to 10 point scale. The best TTS-systems cluster around a mean value of 5.0, whereas the variant with natural durations reached a mean score of 6.6 points, indicating the importance of closely modeling natural segment durations.


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Mixdorff, Hansjörg / Mehnert, Dieter (1999): "Exploring the naturalness of several German high-quality-text-to-speech systems", In EUROSPEECH'99, 1859-1862.