ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Efficient Gaussian mixture model evaluation in voice conversion

Jilei Tian, Jani Nurminen, Victor Popa

Voice conversion refers to the adaptation of the characteristics of a source speaker’s voice to those of a target speaker. Gaussian mixture models (GMM) have been found to be efficient in the voice conversion task. The GMM parameters are estimated from a training set with the goal to minimize the mean squared error (MSE) between the transformed and target vectors. Obviously, the quality of the GMM model plays an important role in achieving better voice conversion quality. This paper presents a very efficient approach for the evaluation of GMM models directly from the model parameters without using any test data, facilitating the improvement of the transformation performance especially in the case of embedded implementations. Though the proposed approach can be used in any application that utilizes GMM based transformation, we take voice conversion as an example application throughout the paper. The proposed approach is experimented with in this context and evaluated against an MSE based evaluation method. The results show that the proposed method is in line with all subjective observations and MSE results.

doi: 10.21437/Interspeech.2006-586

Cite as: Tian, J., Nurminen, J., Popa, V. (2006) Efficient Gaussian mixture model evaluation in voice conversion. Proc. Interspeech 2006, paper 1533-Thu1BuP.9, doi: 10.21437/Interspeech.2006-586

  author={Jilei Tian and Jani Nurminen and Victor Popa},
  title={{Efficient Gaussian mixture model evaluation in voice conversion}},
  booktitle={Proc. Interspeech 2006},
  pages={paper 1533-Thu1BuP.9},