ISCA Archive SSW 2004
ISCA Archive SSW 2004

Intonation modeling for TTS using a joint extraction and prediction approach

Pablo Daniel Agüero, Antonio Bonafonte

This paper presents a joint extraction and prediction framework for intonation modeling. The intonation model is based on a superpositional approach using Bezier curves. The components are attached to minor phrase and accent group. A greedy algorithm performs successive partitions on training data using linguistic information. The parameters related to each partition are obtained using a global optimization procedure. In this way, the extraction process is closely related to the prediction step, and the final performance is higher. Several experiments are performed to test the hypothesis using a two-step intonation modeling procedure for comparison. Results reveal that the prediction accuracy is higher than the reference method. This approach avoids some parameter extraction steps that can produce additional noise, such as the interpolation step used in some intonation models.


Cite as: Agüero, P.D., Bonafonte, A. (2004) Intonation modeling for TTS using a joint extraction and prediction approach. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 67-72

@inproceedings{aguero04_ssw,
  author={Pablo Daniel Agüero and Antonio Bonafonte},
  title={{Intonation modeling for TTS using a joint extraction and prediction approach}},
  year=2004,
  booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)},
  pages={67--72}
}