Text-to-Prosody systems based on the use of prosodic databases extracted from natural speech will be a key point for further development of new Text-to-Speech systems.
This paper describes a system using such speech databases to generate the rhythm and the intonation of texts written in French. The system is based on a very crude chinks n chunks prosodic phrasing algorithm and on an automatic prosodic analysis of a natural speech database. The rhythm of the synthetic speech is generated with a CART tree trained on a large mono-speaker speech corpus. The acoustic aspect of the intonation is derived from a set of prosodic patterns automatically derived from the same speech corpus. At synthesis time, patterns are chosen on the fly from the database so as to minimize a total selection cost composed of pattern target costs and pattern concatenation costs.
Cite as: Malfrère, F., Dutoit, T., Mertens, P. (1998) Automatic prosody generation using suprasegmental unit selection. Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3), 323-328
@inproceedings{malfrere98_ssw, author={F. Malfrère and Thierry Dutoit and Piet Mertens}, title={{Automatic prosody generation using suprasegmental unit selection}}, year=1998, booktitle={Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3)}, pages={323--328} }