EUROSPEECH 2003 - INTERSPEECH 2003
Voice conversion consists in transforming a source speaker voice into a target speaker voice. There are many applications of voice conversion systems where the amount of training data from the source speaker and the target speaker is different. Usually, the amount of source data available is large, but it is desired to estimate the transformation with a small amount of target data. Systems based on joint Gaussian Mixture Models (GMM) are well suited to voice conversion , but they can't deal with source data without its corresponding aligned target data. In this paper, two alternatives are studied to incorporate unaligned source data in the estimation of a GMM for a voice conversion task. It is shown that when a limited amount of aligned parameters are available in the training step, to only include data from the source speaker increases the performance of the voice transformation.
Bibliographic reference. Duxans, Helenca / Bonafonte, Antonio (2003): "Estimation of GMM in voice conversion including unaligned data", In EUROSPEECH-2003, 861-864.