We describe experiments aimed at quantifying the effectiveness of whole-spectrum multiplicative scaling, with different scaling factors k, as a voice-conversion technique. A review of the literature indicated that the fundamental frequency for female excitation is typically a factor of 1.7 greater than for male excitation, whereas female formants are only some 1.16 times higher, indicating that a single, global setting of k can only be a compromise between competing requirements to scale properly the excitation and envelope parts of the spectrum. Nonetheless, we show that the technique can achieve a useful degree of conversion. While female-to-male transformation was more successful in terms of perceived gender change than vice versa, male speech appeared more robust in terms of retaining naturalness and intelligibility when transformed.
Cite as: Chan, P.A., Damper, R.I. (1994) Voice conversion by whole-spectrum scaling. Proc. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, 165-168
@inproceedings{chan94_asriv, author={P. A. Chan and Robert I. Damper}, title={{Voice conversion by whole-spectrum scaling}}, year=1994, booktitle={Proc. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification}, pages={165--168} }