8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Speaker Adaptive Training for One-to-Many Eigenvoice Conversion Based on Gaussian Mixture Model

Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

NAIST, Japan

One-to-many eigenvoice conversion (EVC) allows the conversion of a specific source speaker into arbitrary target speakers. Eigenvoice Gaussian mixture model (EV-GMM) is trained in advance with multiple parallel data sets consisting of the source speaker and many pre-stored target speakers. The EV-GMM is adapted for arbitrary target speakers using only a few utterances by estimating a small number of free parameters. Therefore, the initial EV-GMM directly affects the conversion performance of the adapted EV-GMM. In order to prepare a better initial model, this paper proposes Speaker Adaptive Training (SAT) of a canonical EV-GMM in one-to-many EVC. Results of objective and subjective evaluations demonstrate that SAT causes significant improvements in the performance of EVC.

Full Paper

Bibliographic reference.  Ohtani, Yamato / Toda, Tomoki / Saruwatari, Hiroshi / Shikano, Kiyohiro (2007): "Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model", In INTERSPEECH-2007, 1981-1984.