INTERSPEECH 2004 - ICSLP
This paper describes the acoustic-to-articulatory inversion mapping using a Gaussian Mixture Model (GMM). Correspondence of an acoustic parameter and an articulatory parameter is modeled by the GMM trained using the parallel acoustic-articulatory data. We measure the performance of the GMM-based mapping and investigate the effectiveness of using multiple acoustic frames as an input feature and using multiple mixtures. As a result, it is shown that although increasing the number of mixtures is useful for reducing the estimation error, it causes many discontinuities in the estimated articulatory trajectories. In order to address this problem, we apply maximum likelihood estimation (MLE) considering articulatory dynamic features to the GMM-based mapping. Experimental results demonstrate that the MLE using dynamic features can estimate more appropriate articulatory movements compared with the GMM-based mapping applied smoothing by lowpass filter.
Bibliographic reference. Toda, Tomoki / Black, Alan / Tokuda, Keiichi (2004): "Acoustic-to-articulatory inversion mapping with Gaussian mixture model", In INTERSPEECH-2004, 1129-1132.