Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Multilingual Prosody Modelling Using Cascades of Regression Trees and Neural Networks

J. W. A. Fackrell (1), H. Vereecken (2), J.-P. Martens (2), Bert Van Coile (1,2)

(1) Lernout & Hauspie Speech Products NV, Ieper, Belgium
(2) ELIS, University of Gent, Gent, Belgium

This paper describes the use of automatically-trained models (Regression Trees and Multilayer Perceptrons) to predict three prosodic variables phrase-boundary strength, word prominence and phoneme duration. The models are arranged in a cascade so that the predictions of phrase-boundaries are used as input features to the prominence model, and so on. Cascade models of this type have been constructed for 6 languages, using specially constructed databases, and objective performance statistics are described. For two languages (American English and Dutch) the results of a subjective evaluation experiment suggest that these prosodic models are at least as good as hand-crafted models, and sometimes better. Furthermore, preparing the training data automatically, rather than by manual labelling, seems to have no negative impact on the model performance.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Fackrell, J. W. A. / Vereecken, H. / Martens, J.-P. / Coile, Bert Van (1999): "Multilingual prosody modelling using cascades of regression trees and neural networks", In EUROSPEECH'99, 1835-1838.