This work addresses a study of the GMM-based approach to achieve full-quality speaker timbre conversion. In general, high-quality voice conversion requires accurate spectral envelope estimates, resulting in high-dimensional feature vectors and relatively high computational. Aiming to achieve lowdimensional processing, accurate envelope estimates of the speakers are mel-frequency scaled and projected onto the space defined by a subset of the principal components. The GMMbased features conversion is then performed in the reduced space. Our experimental findings confirm that this strategy provides benefits, especially observed on the resulting converted speech quality, with a significant computational cost reduction.
Index Terms: Speech synthesis, speech analysis, linear prediction, pattern recognition
Cite as: Villavicencio, F., Maestre, E. (2010) GMM-PCA based speaker-timbre conversion on full-quality speech. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 56-61
@inproceedings{villavicencio10_ssw, author={Fernando Villavicencio and Esteban Maestre}, title={{GMM-PCA based speaker-timbre conversion on full-quality speech}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={56--61} }