This paper addresses the “one-to-many” mapping problem in Voice Conversion (VC) by exploring source-to-target mappings in GMMbased spectral transformation. Specifically, we examine differences using source-only versus joint source/target information in the classification stage of transformation, effectively illustrating a “one-to-many effect” in the traditional acoustically-based GMM. We propose combating this effect by using phonetic information in the GMM learning and classification. We then show the success of our proposed context-dependent modeling with transformation results using an objective error criterion. Finally, we discuss implications of our work in adapting current approaches to VC.
Bibliographic reference. Godoy, Elizabeth / Rosec, Olivier / Chonavel, Thierry (2009): "Alleviating the one-to-many mapping problem in voice conversion with context-dependent modeling", In INTERSPEECH-2009, 1627-1630.