ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Multilevel parametric-base F0 model for speech synthesis

Javier Latorre, Masami Akamine

This paper proposes a new F0 model for speech synthesis based on the parameterization of the logF0 contour of the syllables. This parameterization consists of the N-order discrete cosine transform (DCT) plus some additional parameters such as the gradient of the syllable average pitch. A statistical model of the syllable pitch contour is then created by clustering the parameterized vectors with a decision tree. Similar statistical models are also created for other linguistic levels other than the syllable. For synthesis, the statistical model of each level is used to define a log-likelihood function for the input text. These functions are then weighted and added into a global log-likelihood function which is then maximized with respect to the DCT coefficients of the syllable model. The final logF0 contour is obtained from the inverse transformation of the syllable DCT coefficients. A subjective test showed a clear preference for the proposed model against our previous HMM-based baseline.

doi: 10.21437/Interspeech.2008-558

Cite as: Latorre, J., Akamine, M. (2008) Multilevel parametric-base F0 model for speech synthesis. Proc. Interspeech 2008, 2274-2277, doi: 10.21437/Interspeech.2008-558

  author={Javier Latorre and Masami Akamine},
  title={{Multilevel parametric-base F0 model for speech synthesis}},
  booktitle={Proc. Interspeech 2008},