8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Estimation of GMM in Voice Conversion Including Unaligned Data

Helenca Duxans, Antonio Bonafonte

Universitat Politecnica de Catalunya, Spain

Voice conversion consists in transforming a source speaker voice into a target speaker voice. There are many applications of voice conversion systems where the amount of training data from the source speaker and the target speaker is different. Usually, the amount of source data available is large, but it is desired to estimate the transformation with a small amount of target data. Systems based on joint Gaussian Mixture Models (GMM) are well suited to voice conversion [1], but they can't deal with source data without its corresponding aligned target data. In this paper, two alternatives are studied to incorporate unaligned source data in the estimation of a GMM for a voice conversion task. It is shown that when a limited amount of aligned parameters are available in the training step, to only include data from the source speaker increases the performance of the voice transformation.

Full Paper

Bibliographic reference.  Duxans, Helenca / Bonafonte, Antonio (2003): "Estimation of GMM in voice conversion including unaligned data", In EUROSPEECH-2003, 861-864.