15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Sparse Smoothing of Articulatory Features from Gaussian Mixture Model Based Acoustic-to-Articulatory Inversion: Benefit to Speech Recognition

Prasad Sudhakar (1), Prasanta Kumar Ghosh (2)

(1) Université catholique de Louvain, Belgium
(2) Indian Institute of Science, India

Speech recognition using articulatory features estimated using Acoustic-to-Articulatory Inversion (AAI) is considered. A recently proposed sparse smoothing approach is used to postprocess the estimates from Gaussian Mixture Model (GMM) based AAI using Minimum Mean Squared Error (MMSE) criterion. It is well known that low-pass smoothing as post-processing improves the AAI performance. Sparse smoothing, on the other hand, not only improves the AAI performance but also preserves the MMSE optimality for as many estimates as possible. In this work we investigate the benefit of preserving MMSE optimality during postprocessing by using the smoothed articulatory estimates in a broad class phonetic recognition task. Experimental results show that the low-pass filter based smoothing results in a significant drop in the recognition accuracy compared to that using articulatory estimates without any smoothing. However, the recognition accuracy obtained by articulatory features from sparse smoothing is similar to that using articulatory features directly from GMM based AAI without any postprocessing. Thus, sparse smoothing provides benefit both in terms of the inversion performance as well as recognition accuracy, while that is not the case with low-pass smoothing.

Full Paper

Bibliographic reference.  Sudhakar, Prasad / Ghosh, Prasanta Kumar (2014): "Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: benefit to speech recognition", In INTERSPEECH-2014, 169-173.