Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

A Statistical Coarticulatory Model for the Hidden Vocal-Tract-Resonance Dynamics

Li Deng, Jeff Ma

Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada

A statistical coarticulatory model is presented for spontaneous speech recognition, where knowledge of the dynamic, target-directed behavior in the vocal tract resonance responsible for the production of highly coarticulated speech is incorporated into the recognizer design, training, and in likelihood computation. The principal advantage of the new speech model over the conventional HMM is the use of a compact, internal structure that parsimoniously represents long-span context dependence in the observable domain of speech acoustics without using additional, contextdependent model parameters. The new model is formulated mathematically as a constrained, nonstationary, and nonlinear dynamic system, for which aversion of the generalized EM algorithm is developed and implemented for automatically learning the compact set of model parameters. Experiments for speech recognition using spontaneous speech data from SWITCHBOARD corpus are reported.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Deng, Li / Ma, Jeff (1999): "A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics", In EUROSPEECH'99, 1499-1502.