September 22-25, 1997
This paper presents a new scheme for developing a voice conversion system that modifies the utterance of a source speaker to sound like speech from a target speaker. We refer to the method as Speaker Transformation Algorithm using Segmental Codebooks (STASC). Two new methods are described to perform the transformation of vocal tract and glottal excitation characteristics across speakers. In addition, the source speaker's general prosodic characteristics are modified using time-scale and pitch-scale modification algorithms. Informal listening tests suggest that convincing voice conversion is achieved while maintaining high speech quality. The performance of the proposed system is also evaluated on a standard Caussian mixture model based speaker identification system, and the results show that the transformed speech is assigned higher likelihood by the target speaker model when compared to the source model.
Bibliographic reference. Arslan, Levent M. / Talkin, David (1997): "Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum", In EUROSPEECH-1997, 1347-1350.