ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Formant trajectories for acoustic-to-articulatory inversion

I. Yücel Özbek, Mark Hasegawa-Johnson, Mübeccel Demirekler

This work examines the utility of formant frequencies and their energies in acoustic-to-articulatory inversion. For this purpose, formant frequencies and formant spectral amplitudes are automatically estimated from audio, and are treated as observations for the purpose of estimating electromagnetic articulography (EMA) coil positions. A mixture Gaussian regression model with mel-frequency cepstral (MFCC) observations is modified by using formants and energies to either replace or augment the MFCC observation vector. The augmented observation results in 3.4% lower RMS error, and 2% higher correlation coefficient, than the baseline MFCC observation. Improvement is especially good for stop consonants, possibly because formant tracking provides information about the acoustic resonances that would be otherwise unavailable during stop closure and release.

doi: 10.21437/Interspeech.2009-717

Cite as: Özbek, I.Y., Hasegawa-Johnson, M., Demirekler, M. (2009) Formant trajectories for acoustic-to-articulatory inversion. Proc. Interspeech 2009, 2807-2810, doi: 10.21437/Interspeech.2009-717

  author={I. Yücel Özbek and Mark Hasegawa-Johnson and Mübeccel Demirekler},
  title={{Formant trajectories for acoustic-to-articulatory inversion}},
  booktitle={Proc. Interspeech 2009},