INTERSPEECH 2007
8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Conditionally Linear Gaussian Models for Estimating Vocal Tract Resonances

Daniel Rudoy, Daniel N. Spendley, Patrick J. Wolfe

Harvard University, USA

Vocal tract resonances play a central role in the perception and analysis of speech. Here we consider the canonical task of estimating such resonances from an observed acoustic waveform, and formulate it as a statistical model-based tracking problem. In this vein, Deng and colleagues recently showed that a robust linearization of the formant-to-cepstrum map enables the effective use of a Kalman filtering framework. We extend this model both to account for the uncertainty of speech presence by way of a censored likelihood formulation, as well as to explicitly model formant cross-correlation via a vector autoregression, and in doing so retain a conditionally linear and Gaussian framework amenable to efficient estimation schemes. We provide evaluations using a recently introduced public database of formant trajectories, for which results indicate improvements from twenty to over 30% per formant in terms of root mean square error, relative to a contemporary benchmark formant analysis tool.

Full Paper

Bibliographic reference.  Rudoy, Daniel / Spendley, Daniel N. / Wolfe, Patrick J. (2007): "Conditionally linear Gaussian models for estimating vocal tract resonances", In INTERSPEECH-2007, 526-529.