ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Eigenvoice conversion based on Gaussian mixture model

Tomoki Toda, Yamato Ohtani, Kiyohiro Shikano

This paper describes a novel framework of voice conversion (VC). We call it eigenvoice conversion (EVC). We apply EVC to the conversion from a source speakerÂ’s voice to arbitrary target speakersÂ’ voices. Using multiple parallel data sets consisting of utterance-pairs of the source and multiple pre-stored target speakers, a canonical eigenvoice GMM (EV-GMM) is trained in advance. That conversion model enables us to flexibly control the speaker individuality of the converted speech by manually setting weight parameters. In addition, the optimum weight set for a specific target speaker is estimated using only speech data of the target speaker without any linguistic restrictions. We evaluate the performance of EVC by a spectral distortion measure. Experimental results demonstrate that EVC works very well even if we use only a few utterances of the target speaker for the weight estimation.


doi: 10.21437/Interspeech.2006-613

Cite as: Toda, T., Ohtani, Y., Shikano, K. (2006) Eigenvoice conversion based on Gaussian mixture model. Proc. Interspeech 2006, paper 1717-Thu2A3O.5, doi: 10.21437/Interspeech.2006-613

@inproceedings{toda06_interspeech,
  author={Tomoki Toda and Yamato Ohtani and Kiyohiro Shikano},
  title={{Eigenvoice conversion based on Gaussian mixture model}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1717-Thu2A3O.5},
  doi={10.21437/Interspeech.2006-613}
}