Speech Prosody 2004
This paper deals with the automatic analysis and synthesis of intonation using Fujisaki's model. We propose an analysis method which imposes strong linguistic constraints. This method produces good representations of the F0 contour when compared to other current methods which do not impose such constrains. Furthermore, this option limits the variability and is more predictable so it is the best option for prediction (at least when accent commands are related to accent groups). Several prediction algorithms are evaluated. The results show that VCART (an extension of CART to predict vector values) gives the best performance when compared with standard CART or with neural networks. The paper also analyzes which features are more relevant to predict the parameters of Fujisaki's model.
Bibliographic reference. Agüero, Pablo Daniel / Wimmer, Klaus / Bonafonte, Antonio (2004): "Automatic analysis and synthesis of fujisaki's intonation model for TTS", In SP-2004, 427-430.