Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Mapping from Articulatory Movements to Vocal Tract Spectrum with Gaussian Mixture Model for Articulatory Speech Synthesis

Tomoki Toda (1,2), Alan W. Black (1), Keiichi Tokuda (2)

(1) Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
(2) Graduate School of Engineering, Nagoya Institute of Technology, Japan

This paper describes a method for determining the vocal tract spectrum from articulatory movements using a Gaussian Mixture Model (GMM) to synthesize speech with articulatory information. The GMM on joint probability density of articulatory parameters and acoustic spectral parameters is trained using a parallel acoustic-articulatory speech database. We evaluate the performance of the GMM-based mapping by a spectral distortion measure. Experimental results demonstrate that the distortion can be reduced by using not only the articulatory parameters of the vocal tract but also power and voicing information as input features. Moreover, in order to determine the best mapping, we apply maximum likelihood estimation (MLE) to the GMM-based mapping method. Experimental results show that MLE using both static and dynamic features can improve the mapping accuracy compared with the conventional GMM-based mapping.

Full Paper

Bibliographic reference.  Toda, Tomoki / Black, Alan W. / Tokuda, Keiichi (2004): "Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis", In SSW5-2004, 31-36.