EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


High Quality Voice Conversion Based on Gaussian Mixture Model with Dynamic Frequency Warping

Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Nara Institute of Science and Technology, Japan

In the voice conversion algorithm based on the Gaussian Mixture Model (GMM), quality of the converted speech is degraded because the converted spectrum is exceedingly smoothed. In this paper, we newly propose the GMM-based algorithm with the Dynamic Frequency Warping (DFW) to avoid the over-smoothing. We also propose that the converted spectrum is calculated by mixing the GMM-based converted spectrum and the DFW-based converted spectrum, to avoid the deterioration of conversion-accuracy on speaker individuality. Results of the evaluation experiments clarify that the converted speech quality is better than that of the GMM-based algorithm, and the conversionaccuracy on speaker individuality is the same as that of the GMM-based algorithm in the proposed algorithm with the proper weight for mixing spectra.

Full Paper

Bibliographic reference.  Toda, Tomoki / Saruwatari, Hiroshi / Shikano, Kiyohiro (2001): "High quality voice conversion based on Gaussian mixture model with dynamic frequency warping", In EUROSPEECH-2001, 349-352.