We study empirically the best acoustic parameterization for articulatory inversion (the problem of recovering the sequence of vocal tract shapes that produce a given acoustic speech signal). We compare all combinations of the following factors: 1) popular acoustic features such as MFCC and PLP with and without dynamic features; 2) different short-time window lengths; 3) different levels of smoothing of the acoustic temporal trajectories. Experimental results on a real speech production database show consistent improvement when using features closely related to the vocal tract (in particular LSF), dynamic features, and large window length and smoothing (which reduce the jaggedness of the acoustic trajectory). Further improvements are obtained with a 15 ms time delay between acoustic and articulatory frames. However, the improvement attained over other combinations is very small (at most 0.3 mm RMSE).
Bibliographic reference. Qin, Chao / Carreira-Perpiñán, Miguel Á. (2007): "A comparison of acoustic features for articulatory inversion", In INTERSPEECH-2007, 2469-2472.