This paper explores the benefits of transforming spectral peaks in voice conversion. First, in examining classic GMM-based transformation with cepstral coefficients, we show that the lack of transformed data variance ("over-smoothing") can be related to the choice of spectral parameterization. Consequently, we propose an alternative parameterization using spectral peaks. The peaks are transformed using HMMs with Gaussian state distributions. Two learning variants and post-processing treating peak evolution in time are also examined. In comparing the different transformation approaches, spectral peaks are shown to offer higher interspeaker feature correlation and yield higher transformed data variance than their cepstral coefficient counterparts.
Index Terms: voice conversion, spectral transformation, spectral peaks
Cite as: Godoy, E., Rosec, O., Chonavel, T. (2010) On transforming spectral peaks in voice conversion. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 68-73
@inproceedings{godoy10_ssw, author={Elizabeth Godoy and Olivier Rosec and Thierry Chonavel}, title={{On transforming spectral peaks in voice conversion}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={68--73} }