8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Including Dynamic and Phonetic Information in Voice Conversion Systems

Antonio Bonafonte (1), Alexander Kain (2), Jan van Santen (2), Helenca Duxans (1)

(1) Technical University of Catalonia (UPC), Spain
(2) Oregon Health Science University, USA

Voice Conversion (VC) systems modify a speaker voice (source speaker) to be perceived as if another speaker (target speaker) had uttered it. Previous published VC approaches using Gaussian Mixture Models performs the conversion in a frame-by-frame basis using only spectral information. In this paper, two new approaches are studied in order to extend the GMM-based VC systems. First, dynamic information is used to build the speaker acoustic model. So, the transformation is carried out according to sequences of frames. Then, phonetic information is introduced in the training of the VC system. Objective and perceptual results compare the performance of the proposed systems.

Full Paper

Bibliographic reference.  Bonafonte, Antonio / Kain, Alexander / Santen, Jan van / Duxans, Helenca (2004): "Including dynamic and phonetic information in voice conversion systems", In INTERSPEECH-2004, 1193-1196.