INTERSPEECH 2004 - ICSLP
Voice Conversion (VC) systems modify a speaker voice (source speaker) to be perceived as if another speaker (target speaker) had uttered it. Previous published VC approaches using Gaussian Mixture Models performs the conversion in a frame-by-frame basis using only spectral information. In this paper, two new approaches are studied in order to extend the GMM-based VC systems. First, dynamic information is used to build the speaker acoustic model. So, the transformation is carried out according to sequences of frames. Then, phonetic information is introduced in the training of the VC system. Objective and perceptual results compare the performance of the proposed systems.
Bibliographic reference. Bonafonte, Antonio / Kain, Alexander / Santen, Jan van / Duxans, Helenca (2004): "Including dynamic and phonetic information in voice conversion systems", In INTERSPEECH-2004, 1193-1196.