Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Spectral Voice Conversion Based on Unsupervised Clustering of Acoustic Space

Masoud Geravanchizadeh

Institute of Communication Acoustics, Ruhr-University Bochum, Bochum, Germany

Voice conversion systems aim at modifying a source speakerís speech so that it is perceived as if a target speaker had spoken it. Applying voice conversion techniques to a concatenative text-to-speech synthesizer allows for the personification of such systems, so that additional voices from a single source-speaker database can be produced quickly and automatically. This paper presents a new algorithm in which an effective and simple solution to the problem of voice conversion is suggested with the goal of maintaining high speech quality. Here, spectral conversion is performed by locally linear transformations, where the minimum mean square estimation (MMSE) method is used to compute the transformations. The acoustic features included in the conversion are vocal tract parameters, which are represented by log area ratio coefficients. Evaluation by listening tests shows that the proposed algorithm makes it possible to convert speaker individuality while maintaining high quality.

Full Paper

Bibliographic reference.  Geravanchizadeh, Masoud (2000): "Spectral voice conversion based on unsupervised clustering of acoustic space", In ICSLP-2000, vol.3, 614-617.