Third ESCA/COCOSDA Workshop on Speech Synthesis
November 26-29, 1998
Text-to-Prosody systems based on the use of prosodic databases extracted from natural speech will be a key point for further development of new Text-to-Speech systems.
This paper describes a system using such speech databases to generate the rhythm and the intonation of texts written in French. The system is based on a very crude chinks ’n chunks prosodic phrasing algorithm and on an automatic prosodic analysis of a natural speech database. The rhythm of the synthetic speech is generated with a CART tree trained on a large mono-speaker speech corpus. The acoustic aspect of the intonation is derived from a set of prosodic patterns automatically derived from the same speech corpus. At synthesis time, patterns are chosen on the fly from the database so as to minimize a total selection cost composed of pattern target costs and pattern concatenation costs.
Full Paper (with 2 sound examples linked from within the paper)
Bibliographic reference. Malfrère, F. / Dutoit, Thierry / Mertens, Piet (1998): "Automatic Prosody Generation Using Suprasegmental Unit Selection", In SSW3-1998, 323-328.