9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

An Improved One-to-Many Eigenvoice Conversion System

Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

NAIST, Japan

We have previously developed a one-to-many eigenvoice conversion (EVC) system enabling the conversion from a specific source speaker's voice into an arbitrary target speaker's voice. In this system, eigenvoice Gaussian mixture model (EV-GMM) is trained in advance with multiple parallel data sets composed of utterance pairs of the source and many pre-stored target speakers. The EV-GMM is effectively adapted to an arbitrary target speaker using a small amount of adaptation data. Although this system achieves the very flexible training of the conversion model, the quality of the converted speech is still not high enough. In order to alleviate this problem, we simultaneously apply the following promising techniques to the one-to-many EVC system: 1) STRAIGHT mixed excitation, 2) the conversion algorithm considering global variance, and 3) speaker adaptive training of the EV-GMM. Experimental results demonstrate that the proposed system causes remarkable improvements in the performance of EVC.

Full Paper

Bibliographic reference.  Ohtani, Yamato / Toda, Tomoki / Saruwatari, Hiroshi / Shikano, Kiyohiro (2008): "An improved one-to-many eigenvoice conversion system", In INTERSPEECH-2008, 1080-1083.