12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

A Study on Bag of Gaussian Model with Application to Voice Conversion

Yu Qiao (1), Tong Tong (1), Nobuaki Minematsu (2)

(1) Chinese Academy of Sciences, China
(2) University of Tokyo, Japan

The GMM based mapping techniques proved to be an efficient method to find nonlinear regression function between two spaces, and found success in voice conversion. In these methods, a linear transformation is estimated for each Gaussian component, and the final conversion function is a weighted summation of all linear transformations. These linear transformations fit well for the samples near to the center of at least one Gaussian component, but may not deal well with the samples far from the centers of all Gaussian distributions. To overcome this problem, this paper proposes Bag of Gaussian Model (BGM). BGM model consists of two types of Gaussian distributions, namely basic and complex distributions. Compared with classical GMM, BGM is adaptive for samples. That is for a sample, BGM can select a set of Gaussian distributions which fit the sample best. We develop a data-driven method to construct BGM model and show how to estimate regression function with BGM. We carry out experiment on voice conversion tasks. The experimental results exhibit the usefulness of BGM based methods.

Full Paper

Bibliographic reference.  Qiao, Yu / Tong, Tong / Minematsu, Nobuaki (2011): "A study on bag of Gaussian model with application to voice conversion", In INTERSPEECH-2011, 657-660.