Sixth International Conference on Spoken Language Processing
Voice conversion systems aim at modifying a source speakerís speech so that it is perceived as if a target speaker had spoken it. Applying voice conversion techniques to a concatenative text-to-speech synthesizer allows for the personification of such systems, so that additional voices from a single source-speaker database can be produced quickly and automatically. This paper presents a new algorithm in which an effective and simple solution to the problem of voice conversion is suggested with the goal of maintaining high speech quality. Here, spectral conversion is performed by locally linear transformations, where the minimum mean square estimation (MMSE) method is used to compute the transformations. The acoustic features included in the conversion are vocal tract parameters, which are represented by log area ratio coefficients. Evaluation by listening tests shows that the proposed algorithm makes it possible to convert speaker individuality while maintaining high quality.
Bibliographic reference. Geravanchizadeh, Masoud (2000): "Spectral voice conversion based on unsupervised clustering of acoustic space", In ICSLP-2000, vol.3, 614-617.