ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Voice conversion based on mixtures of factor analyzers

Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Akinobu Lee, Keiichi Tokuda

This paper describes the voice conversion based on the Mixtures of Factor Analyzers (MFA) which can provide an efficient modeling with a limited amount of training data. As a typical spectral conversion method, a mapping algorithm based on the Gaussian Mixture Model (GMM) has been proposed. In this method two kinds of covariance matrix structures are often used : the diagonal and full covariance matrices. GMM with diagonal covariance matrices requires a large number of mixture components for accurately estimating spectral features. On the other hand, GMM with full covariance matrices needs sufficient training data to estimate model parameters. In order to cope with these problems, we apply MFA to voice conversion. MFA can be regarded as intermediate model between GMM with diagonal covariance and with full covariance. Experimental results show that MFA can improve the conversion accuracy compared with the conventional GMM.

doi: 10.21437/Interspeech.2006-585

Cite as: Uto, Y., Nankaku, Y., Toda, T., Lee, A., Tokuda, K. (2006) Voice conversion based on mixtures of factor analyzers. Proc. Interspeech 2006, paper 2076-Thu1BuP.8, doi: 10.21437/Interspeech.2006-585

  author={Yosuke Uto and Yoshihiko Nankaku and Tomoki Toda and Akinobu Lee and Keiichi Tokuda},
  title={{Voice conversion based on mixtures of factor analyzers}},
  booktitle={Proc. Interspeech 2006},
  pages={paper 2076-Thu1BuP.8},