Intonation generation is still one of the weak links in the text-to-speech synthesis chain. It is a hard enough task to generate expressively neutral pitch contours, with accurate placement of accents and phrase boundaries, but to generate appropriate intonation for expressive speech is even more of a challenge. This paper is a first attempt at describing and categorizing the variation in pitch contours that occur in expressive speech, which is a necessary step in the development of a new intonation model for expressive speech. The analysis is performed in the framework of the Generalized Linear Alignment model [10]. A hierarchical clustering technique of foot-based pitch contours revealed some interesting phenomena. Apart from the standard declining phrase curve, we observed phrase curves consisting of an incline, an optional plateau and a decline. These phrase curves are often observed on the last two feet making up a minor or major phrase. In addition, the continuation rise that is associated with marking the end of a minor phrase, only occurred in about 10% of the cases.
Cite as: Klabbers, E., Santen, J.P.H.v. (2004) Clustering of foot-based pitch contours in expressive speech. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 73-78
@inproceedings{klabbers04_ssw, author={Esther Klabbers and Jan P. H. van Santen}, title={{Clustering of foot-based pitch contours in expressive speech}}, year=2004, booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)}, pages={73--78} }