ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

An improved one-to-many eigenvoice conversion system

Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

We have previously developed a one-to-many eigenvoice conversion (EVC) system enabling the conversion from a specific source speaker's voice into an arbitrary target speaker's voice. In this system, eigenvoice Gaussian mixture model (EV-GMM) is trained in advance with multiple parallel data sets composed of utterance pairs of the source and many pre-stored target speakers. The EV-GMM is effectively adapted to an arbitrary target speaker using a small amount of adaptation data. Although this system achieves the very flexible training of the conversion model, the quality of the converted speech is still not high enough. In order to alleviate this problem, we simultaneously apply the following promising techniques to the one-to-many EVC system: 1) STRAIGHT mixed excitation, 2) the conversion algorithm considering global variance, and 3) speaker adaptive training of the EV-GMM. Experimental results demonstrate that the proposed system causes remarkable improvements in the performance of EVC.

doi: 10.21437/Interspeech.2008-333

Cite as: Ohtani, Y., Toda, T., Saruwatari, H., Shikano, K. (2008) An improved one-to-many eigenvoice conversion system. Proc. Interspeech 2008, 1080-1083, doi: 10.21437/Interspeech.2008-333

  author={Yamato Ohtani and Tomoki Toda and Hiroshi Saruwatari and Kiyohiro Shikano},
  title={{An improved one-to-many eigenvoice conversion system}},
  booktitle={Proc. Interspeech 2008},