Sixth European Conference on Speech Communication and Technology
This paper describes the use of automatically-trained models (Regression Trees and Multilayer Perceptrons) to predict three prosodic variables – phrase-boundary strength, word prominence and phoneme duration. The models are arranged in a cascade so that the predictions of phrase-boundaries are used as input features to the prominence model, and so on. Cascade models of this type have been constructed for 6 languages, using specially constructed databases, and objective performance statistics are described. For two languages (American English and Dutch) the results of a subjective evaluation experiment suggest that these prosodic models are at least as good as hand-crafted models, and sometimes better. Furthermore, preparing the training data automatically, rather than by manual labelling, seems to have no negative impact on the model performance.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Fackrell, J. W. A. / Vereecken, H. / Martens, J.-P. / Coile, Bert Van (1999): "Multilingual prosody modelling using cascades of regression trees and neural networks", In EUROSPEECH'99, 1835-1838.