This paper describes a novel framework of voice conversion (VC). We call it eigenvoice conversion (EVC). We apply EVC to the conversion from a source speakerÂ’s voice to arbitrary target speakersÂ’ voices. Using multiple parallel data sets consisting of utterance-pairs of the source and multiple pre-stored target speakers, a canonical eigenvoice GMM (EV-GMM) is trained in advance. That conversion model enables us to flexibly control the speaker individuality of the converted speech by manually setting weight parameters. In addition, the optimum weight set for a specific target speaker is estimated using only speech data of the target speaker without any linguistic restrictions. We evaluate the performance of EVC by a spectral distortion measure. Experimental results demonstrate that EVC works very well even if we use only a few utterances of the target speaker for the weight estimation.
Cite as: Toda, T., Ohtani, Y., Shikano, K. (2006) Eigenvoice conversion based on Gaussian mixture model. Proc. Interspeech 2006, paper 1717-Thu2A3O.5, doi: 10.21437/Interspeech.2006-613
@inproceedings{toda06_interspeech, author={Tomoki Toda and Yamato Ohtani and Kiyohiro Shikano}, title={{Eigenvoice conversion based on Gaussian mixture model}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1717-Thu2A3O.5}, doi={10.21437/Interspeech.2006-613} }