ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

High quality voice conversion based on Gaussian mixture model with dynamic frequency warping

Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

In the voice conversion algorithm based on the Gaussian Mixture Model (GMM), quality of the converted speech is degraded because the converted spectrum is exceedingly smoothed. In this paper, we newly propose the GMM-based algorithm with the Dynamic Frequency Warping (DFW) to avoid the over-smoothing. We also propose that the converted spectrum is calculated by mixing the GMM-based converted spectrum and the DFW-based converted spectrum, to avoid the deterioration of conversion-accuracy on speaker individuality. Results of the evaluation experiments clarify that the converted speech quality is better than that of the GMM-based algorithm, and the conversionaccuracy on speaker individuality is the same as that of the GMM-based algorithm in the proposed algorithm with the proper weight for mixing spectra.


doi: 10.21437/Eurospeech.2001-108

Cite as: Toda, T., Saruwatari, H., Shikano, K. (2001) High quality voice conversion based on Gaussian mixture model with dynamic frequency warping. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 349-352, doi: 10.21437/Eurospeech.2001-108

@inproceedings{toda01_eurospeech,
  author={Tomoki Toda and Hiroshi Saruwatari and Kiyohiro Shikano},
  title={{High quality voice conversion based on Gaussian mixture model with dynamic frequency warping}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={349--352},
  doi={10.21437/Eurospeech.2001-108}
}