ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Exploring the naturalness of several German high-quality-text-to-speech systems

Hansjörg Mixdorff, Dieter Mehnert

The synthesis of near-to-natural F0 contours is an important issue in text-to-speech and crucial to the naturalness and intelligibility of synthetic speech. In earlier studies of the first author a model of German intonation was developed that is based on the quantitative Fujisaki-model. The current paper addresses a perception experiment comparing a TTS-system incorporating this new approach with several German TTS-systems with high segmental quality. Natural speech samples and a synthesis version with natural segment durations were used as references. Results show, that the natural speech samples unanimously received 10 points on a 0 to 10 point scale. The best TTS-systems cluster around a mean value of 5.0, whereas the variant with natural durations reached a mean score of 6.6 points, indicating the importance of closely modeling natural segment durations.


doi: 10.21437/Eurospeech.1999-406

Cite as: Mixdorff, H., Mehnert, D. (1999) Exploring the naturalness of several German high-quality-text-to-speech systems. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 1859-1862, doi: 10.21437/Eurospeech.1999-406

@inproceedings{mixdorff99_eurospeech,
  author={Hansjörg Mixdorff and Dieter Mehnert},
  title={{Exploring the naturalness of several German high-quality-text-to-speech systems}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={1859--1862},
  doi={10.21437/Eurospeech.1999-406}
}