This paper addresses the one-to-many mapping problem in Voice Conversion (VC) by exploring source-to-target mappings in GMMbased spectral transformation. Specifically, we examine differences using source-only versus joint source/target information in the classification stage of transformation, effectively illustrating a one-to-many effect in the traditional acoustically-based GMM. We propose combating this effect by using phonetic information in the GMM learning and classification. We then show the success of our proposed context-dependent modeling with transformation results using an objective error criterion. Finally, we discuss implications of our work in adapting current approaches to VC.
Cite as: Godoy, E., Rosec, O., Chonavel, T. (2009) Alleviating the one-to-many mapping problem in voice conversion with context-dependent modeling. Proc. Interspeech 2009, 1627-1630, doi: 10.21437/Interspeech.2009-486
@inproceedings{godoy09_interspeech, author={Elizabeth Godoy and Olivier Rosec and Thierry Chonavel}, title={{Alleviating the one-to-many mapping problem in voice conversion with context-dependent modeling}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={1627--1630}, doi={10.21437/Interspeech.2009-486} }