ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation

Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

The performance of voice conversion has been considerably improved through statistical modeling of spectral sequences. However, the converted speech still contains traces of artificial sounds. To alleviate this, it is necessary to statistically model a source sequence as well as a spectral sequence. In this paper, we introduce STRAIGHT mixed excitation to a framework of the voice conversion based on a Gaussian Mixture Model (GMM) on joint probability density of source and target features. We convert both spectral and source feature sequences based on Maximum Likelihood Estimation (MLE). Objective and subjective evaluation results demonstrate that the proposed source conversion produces strong improvements in both the converted speech quality and the conversion accuracy for speaker individuality.


doi: 10.21437/Interspeech.2006-582

Cite as: Ohtani, Y., Toda, T., Saruwatari, H., Shikano, K. (2006) Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. Proc. Interspeech 2006, paper 1681-Thu1BuP.5, doi: 10.21437/Interspeech.2006-582

@inproceedings{ohtani06_interspeech,
  author={Yamato Ohtani and Tomoki Toda and Hiroshi Saruwatari and Kiyohiro Shikano},
  title={{Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1681-Thu1BuP.5},
  doi={10.21437/Interspeech.2006-582}
}