ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Formant frequency prediction from MFCC vectors in noisy environments

Jonathan Darch, Ben Milner, Saeed Vaseghi

This paper proposes a method of predicting the formant frequencies of a frame of speech from its mel-frequency cepstral coefficient (MFCC) representation. Prediction is achieved through the creation of a Gaussian mixture model (GMM) which models the joint density of formant frequencies and MFCCs. Using this GMM and an input MFCC vector, a maximum a posteriori (MAP) prediction of the formant frequencies is generated. Formant prediction accuracy is evaluated on both a constrained vocabulary connected digits database and on a 5000 word large vocabulary database. Experiments first examine the accuracy of formant frequency prediction as the number of clusters in the GMM is varied with a best formant frequency prediction error of 3.72% being obtained. Secondly the effect of noise on formant prediction accuracy is examined. A fall in accuracy is observed with reducing signal-to-noise ratios, but by using a GMM matched to the noise conditions formant prediction accuracy is significantly improved.

doi: 10.21437/Interspeech.2005-207

Cite as: Darch, J., Milner, B., Vaseghi, S. (2005) Formant frequency prediction from MFCC vectors in noisy environments. Proc. Interspeech 2005, 1129-1132, doi: 10.21437/Interspeech.2005-207

  author={Jonathan Darch and Ben Milner and Saeed Vaseghi},
  title={{Formant frequency prediction from MFCC vectors in noisy environments}},
  booktitle={Proc. Interspeech 2005},