This paper describes the use of automatically-trained models (Regression Trees and Multilayer Perceptrons) to predict three prosodic variables phrase-boundary strength, word prominence and phoneme duration. The models are arranged in a cascade so that the predictions of phrase-boundaries are used as input features to the prominence model, and so on. Cascade models of this type have been constructed for 6 languages, using specially constructed databases, and objective performance statistics are described. For two languages (American English and Dutch) the results of a subjective evaluation experiment suggest that these prosodic models are at least as good as hand-crafted models, and sometimes better. Furthermore, preparing the training data automatically, rather than by manual labelling, seems to have no negative impact on the model performance.
Cite as: Fackrell, J.W.A., Vereecken, H., Martens, J.-P., Coile, B.V. (1999) Multilingual prosody modelling using cascades of regression trees and neural networks. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 1835-1838, doi: 10.21437/Eurospeech.1999-400
@inproceedings{fackrell99_eurospeech, author={J. W. A. Fackrell and H. Vereecken and J.-P. Martens and Bert Van Coile}, title={{Multilingual prosody modelling using cascades of regression trees and neural networks}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={1835--1838}, doi={10.21437/Eurospeech.1999-400} }