Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Intonation Modeling for TTS Using a Joint Extraction and Prediction Approach

Pablo Daniel AgŁero, Antonio Bonafonte

TALP Research Center, Universitat PolitŤcnica de Catalunya (UPC), Barcelona, Spain

This paper presents a joint extraction and prediction framework for intonation modeling. The intonation model is based on a superpositional approach using Bezier curves. The components are attached to minor phrase and accent group. A greedy algorithm performs successive partitions on training data using linguistic information. The parameters related to each partition are obtained using a global optimization procedure. In this way, the extraction process is closely related to the prediction step, and the final performance is higher. Several experiments are performed to test the hypothesis using a two-step intonation modeling procedure for comparison. Results reveal that the prediction accuracy is higher than the reference method. This approach avoids some parameter extraction steps that can produce additional noise, such as the interpolation step used in some intonation models.

Full Paper

Bibliographic reference.  AgŁero, Pablo Daniel / Bonafonte, Antonio (2004): "Intonation modeling for TTS using a joint extraction and prediction approach", In SSW5-2004, 67-72.