8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Acoustic-to-Articulatory Inversion Mapping with Gaussian Mixture Model

Tomoki Toda (1), Alan Black (2), Keiichi Tokuda (1)

(1) Nagoya Institute of Technology, Japan
(2) Carnegie Mellon University, USA

This paper describes the acoustic-to-articulatory inversion mapping using a Gaussian Mixture Model (GMM). Correspondence of an acoustic parameter and an articulatory parameter is modeled by the GMM trained using the parallel acoustic-articulatory data. We measure the performance of the GMM-based mapping and investigate the effectiveness of using multiple acoustic frames as an input feature and using multiple mixtures. As a result, it is shown that although increasing the number of mixtures is useful for reducing the estimation error, it causes many discontinuities in the estimated articulatory trajectories. In order to address this problem, we apply maximum likelihood estimation (MLE) considering articulatory dynamic features to the GMM-based mapping. Experimental results demonstrate that the MLE using dynamic features can estimate more appropriate articulatory movements compared with the GMM-based mapping applied smoothing by lowpass filter.

Full Paper

Bibliographic reference.  Toda, Tomoki / Black, Alan / Tokuda, Keiichi (2004): "Acoustic-to-articulatory inversion mapping with Gaussian mixture model", In INTERSPEECH-2004, 1129-1132.