11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Probabilistic Integration of Joint Density Model and Speaker Model for Voice Conversion

Daisuke Saito (1), Shinji Watanabe (2), Atsushi Nakamura (2), Nobuaki Minematsu (1)

(1) University of Tokyo, Japan
(2) NTT Corporation, Japan

This paper describes a novel approach to voice conversion using both a joint density model and a speaker model. In voice conversion studies, approaches based on Gaussian Mixture Model (GMM) with a joint density model are widely used to estimate a transformation. However, for sufficient quality, they require a parallel corpus which contains plenty of utterances with the same linguistic content spoken by both the speakers. In addition, the joint density GMM methods often suffer from over-training effects when the amount of training data is small. To compensate for these problems, we propose a novel approach to integrate the speaker GMM of the target with the joint density model using probabilistic formulation. The proposed method trains the joint density model and the speaker model, independently. It eases the burden on the source speaker. Experiments demonstrate the effectiveness of the proposed method, especially when the amount of the parallel corpus is small.

Full Paper

Bibliographic reference.  Saito, Daisuke / Watanabe, Shinji / Nakamura, Atsushi / Minematsu, Nobuaki (2010): "Probabilistic integration of joint density model and speaker model for voice conversion", In INTERSPEECH-2010, 1728-1731.