Second ESCA/IEEE Workshop on Speech Synthesis
September 12-15, 1994
This work develops a new model of fundamental frequency (F0) generation that incorporates traditional methods of F0 modeling, but also has parameters that can be automatically estimated from prosodically labeled speech. We generate F0 with a state-space dynamical system model which assumes that there is an unobserved state vector corresponding to the noisy observation of F0 and energy. Parameters of the model are specified to capture segment, syllable, and/or phrase level effects. Since there are missing observations corresponding to the state vector and unvoiced segments, we use a non-traditional method for parameter estimation based on an EM algorithm developed for speech recognition applications. In experiments on an independent test set, we obtained a rms error of 33 Hz for F0.
Bibliographic reference. Ross, Ken / Ostendorf, Mari (1994): "A dynamical system model for generating F0 for synthesis", In SSW2-1994, 131-134.