Interspeech'2005 - Eurospeech
Voice conversion technologies transform individual characteristics of speech patterns while preserving the original content, and can be widely used in speech processing. Considering limited system resources, in particular, of embedded concatenative speech synthesis, voice conversion may reduce the memory consumption of the acoustic database. Voice conversion enables the intra-gender or cross-gender generation of new voices by using an existing high-quality voice.
Usually, voice conversion is based on modification of spectral properties in accord with pitch manipulation. Warping functions in the frequency domain aiming at a reverse vocal tract length normalization (VTLN) is a simplified approach. Consequently, voice conversion itself generates a critical calculation complexity which contradicts the practical constraints of typical embedded and mobile applications.
The authors propose a novel approach for voice conversion by reusing features of a common speech codec. Such a codec is already available in typical mobile applications and the resulting voice quality is widely accepted. The paper investigates the manipulation of the immittance spectral frequencies (ISF) provided by the Adaptive Multi Rate Wideband codec (AMR-WB). This algorithm has been integrated into the embedded speech synthesizer microDRESS.
Bibliographic reference. Strecha, Guntram / Jokisch, Oliver / Eichner, Matthias / Hoffmann, Rüdiger (2005): "Codec integrated voice conversion for embedded speech synthesis", In INTERSPEECH-2005, 2589-2592.