We address the problem of robust formant tracking in continuous speech. We propose the robust statistical model of t-distribution mixture density (tMM) operating on the "pyknogram" obtained through a multiband AM-FM demodulation technique. The statistical model of the pyknogram is shown to be more-effective to handle the variability in the signal processing stage. The t-mixture density estimation is shown to be effective than Gaussian mixture density because of outlier data in the pyknogram. For formant tracking, we show that the tMM is better in terms of parameter selection, accuracy, and smoothness of the estimate. We present experimental results on simulated data, real speech sentences, and test the robustness of the proposed MDA-tMM method to additive noise. Comparisons with PRAAT software and a recently-developed adaptive filterbank technique show that the proposed MDA-tMM method is superior in several aspects.
Bibliographic reference. Harshavardhan, Sundar / Seelamantula, Chandra Sekhar / Sreenivas, Thippur V. (2010): "A multimodal density function estimation approach to formant tracking", In INTERSPEECH-2010, 2410-2413.