Apple's next-generation text-to-speech system in MacOS X uses a superpositional pitch model, comprising a relatively smooth underlying F0 contour and a separate contribution from the influence of the phonetic segments. This paper focuses on the data-driven modelling of the underlying contour, based on electroglottographic signals obtained from a corpus of reiterant speech. F0 extraction from such signals leads to more accurate characteristic shapes, as objectively illustrated by a typically low mean absolute frequency deviation (between 2 and 3 Hz) between original and synthetic F0 contours. This in turn supports a better (both more complete and more realistic) model of F0 behavior. Experimental results illustrate the improved prosodic representation resulting from this F0 model.
Cite as: Silverman, K.E.A., Bellegarda, J.R., Lenzo, K.A. (2001) Smooth contour estimation in data-driven pitch modelling. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1167-1170, doi: 10.21437/Eurospeech.2001-305
@inproceedings{silverman01_eurospeech, author={Kim E. A. Silverman and Jerime R. Bellegarda and Kevin A. Lenzo}, title={{Smooth contour estimation in data-driven pitch modelling}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={1167--1170}, doi={10.21437/Eurospeech.2001-305} }