ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Codec integrated voice conversion for embedded speech synthesis

Guntram Strecha, Oliver Jokisch, Matthias Eichner, RĂ¼diger Hoffmann

Voice conversion technologies transform individual characteristics of speech patterns while preserving the original content, and can be widely used in speech processing. Considering limited system resources, in particular, of embedded concatenative speech synthesis, voice conversion may reduce the memory consumption of the acoustic database. Voice conversion enables the intra-gender or cross-gender generation of new voices by using an existing high-quality voice.

Usually, voice conversion is based on modification of spectral properties in accord with pitch manipulation. Warping functions in the frequency domain aiming at a reverse vocal tract length normalization (VTLN) is a simplified approach. Consequently, voice conversion itself generates a critical calculation complexity which contradicts the practical constraints of typical embedded and mobile applications.

The authors propose a novel approach for voice conversion by reusing features of a common speech codec. Such a codec is already available in typical mobile applications and the resulting voice quality is widely accepted. The paper investigates the manipulation of the immittance spectral frequencies (ISF) provided by the Adaptive Multi Rate Wideband codec (AMR-WB). This algorithm has been integrated into the embedded speech synthesizer microDRESS.

doi: 10.21437/Interspeech.2005-802

Cite as: Strecha, G., Jokisch, O., Eichner, M., Hoffmann, R. (2005) Codec integrated voice conversion for embedded speech synthesis. Proc. Interspeech 2005, 2589-2592, doi: 10.21437/Interspeech.2005-802

  author={Guntram Strecha and Oliver Jokisch and Matthias Eichner and RĂ¼diger Hoffmann},
  title={{Codec integrated voice conversion for embedded speech synthesis}},
  booktitle={Proc. Interspeech 2005},