8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Use of Neural Network Mapping and Extended Kalman Filter to Recover Vocal Tract Resonances from the MFCC Parameters of Speech

Li Deng (1), Roberto Togneri (2)

(1) Microsoft Corp, USA
(2) University of Western Australia

In this paper, we present a state-space formulation of a neural-network-based hidden dynamic model of speech whose parameters are trained using an approximate EM algorithm. The training makes use of the results of an off-the-shelf formant tracker (during the vowel segments) to simplify the complex sufficient statistics that would be required in the exact EM algorithm. The trained model, consisting of the state equation for the target-directed vocal tract resonance (VTR) dynamics on all classes of speech sounds (including consonant closure) and the observation equation for mapping from the VTR to acoustic measurement, is then used to recover the unobserved VTR based on Extended Kalman Filter. The results demonstrate accurate estimation of the VTRs, especially those during rapic consonant-vowel or vowel-consonant transitions and during consonant closure when the acoustic measurement alone provides weak or no information to infer the VTR values.

Full Paper

Bibliographic reference.  Deng, Li / Togneri, Roberto (2004): "Use of neural network mapping and extended kalman filter to recover vocal tract resonances from the MFCC parameters of speech", In INTERSPEECH-2004, 1097-1100.