Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Using Decision Trees within the Tilt Intonation Model to Predict F0 Contours

Kurt E. Dusterhoff, Alan W. Black, Paul Taylor

Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK

This paper presents an intonation generation system for use in a text-to-speech synthesis system. The intonation generation system uses classification trees to predict intonation event location and regression trees to predict parameters relating to the F0 shape for the predicted events. The decision trees model intonation within the Tilt intonation model, which provides a parameterized description of fundmaental frequency and an intuitive labelling scheme. The event location trees predict an event class (e.g. accent, boundary, none) for each syllable in an utterance based on local and global context (e.g. stress, phrasing, part of speech). The parameter prediction trees then provide the parameterized description of each intonation event based on similar context features. Informal results of the full system are presented together with results for the individual components.

