Dynamic Frequency Warping (DFW) offers an appealing alternative to GMM-based voice conversion, which suffers from "over-smoothing" that hinders speech quality. However, to adjust spectral power after DFW, previous work returns to GMMtransformation. This paper proposes a more effective DFW with amplitude scaling (DFWA) that functions on the acoustic class level and is independent of GMM-transformation. The amplitude scaling compares average target and warped source log amplitude spectra for each class. DFWA outperforms the GMM in terms of both speech quality and timbre conversion, as confirmed in objective and subjective testing. Moreover, DFWA performance is equivalent using parallel or nonparallel corpora.
Bibliographic reference. Godoy, Elizabeth / Rosec, Olivier / Chonavel, Thierry (2011): "Spectral envelope transformation using DFW and amplitude scaling for voice conversion with parallel or nonparallel corpora", In INTERSPEECH-2011, 673-676.