Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Features for F0 Contour Prediction

Ted H. Applebaum, Nick Kibre, Steve Pearson

Panasonic Technologies Inc., Speech Technology Laboratory, Santa Barbara, CA, USA

Decision trees based on features derived from text analysis have previously been used to predict the input parameters of models of F0 contour for text-to-speech synthesis. Yet it is not known which features contribute most to the success of the prediction. This paper quantifies the dependence of the predicted F0 contour on each of several input features derived from the text.

Parameters for the Tilt intonation model of F0 contour were predicted by decision trees trained on 6 simple features or 17 features derived from a rule-based front end. To evaluate the contribution of each input feature, F0 prediction error measures were first compared within a group of predictors where each predictor considered only a single input feature, and then within a second group of predictors where each predictor ignored one of the input features.

F0 prediction error was measured on a new speaker by RMS deviation, mean absolute deviation and correlation. Similar trends were observed for each error measure.

The features observed to most strongly affect F0 prediction were "position in the word of the following syllable", "percent of the way through a breath group", "presence of prosodic boundary at the end of the syllable" and "stress of the current syllable". These features are defined over different time scales and demonstrate how a local model of F0 contour can capture global properties.

Full Paper

Bibliographic reference.  Applebaum, Ted H. / Kibre, Nick / Pearson, Steve (2000): "Features for F0 contour prediction", In ICSLP-2000, vol.1, 629-632.