7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Intonation Modelling for the Synthesis of Structured Documents

Jeska Buhmann (1), Jean-Pierre Martens (1), Lieve Macken (2), Bert Van Coile (1)

(1) Ghent University, Belgium; (2) ScanSoft, Belgium

Human readings of structured documents exhibit a much richer intonation than that observed in read isolated sentences. It is a challenge to capture this richness in an automatic way using data-driven techniques. In this paper, we extend our previous research on intonation modelling for isolated sentences in different respects: (i) the RNN (Recurrent Neural Network) intonation model is now trained and evaluated on read documents, (ii) the model is evaluated as part of the overall prosody model, (iii) the feature selection process is completely automated, and (iv) the importance of text-level features such as text type, text structure and typesetting are investigated. It is demonstrated that acceptable intonation models can be constructed starting from a database that does not contain any explicit hand labelling of the intonation contours. It also appears that text type and text structure are important features whereas type-setting is not.


Full Paper

Bibliographic reference.  Buhmann, Jeska / Martens, Jean-Pierre / Macken, Lieve / Coile, Bert Van (2002): "Intonation modelling for the synthesis of structured documents", In ICSLP-2002, 2089-2092.