12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Spectral Envelope Transformation Using DFW and Amplitude Scaling for Voice Conversion with Parallel or Nonparallel Corpora

Elizabeth Godoy (1), Olivier Rosec (1), Thierry Chonavel (2)

(1) Orange Labs, France
(2) Telecom Bretagne, France

Dynamic Frequency Warping (DFW) offers an appealing alternative to GMM-based voice conversion, which suffers from "over-smoothing" that hinders speech quality. However, to adjust spectral power after DFW, previous work returns to GMMtransformation. This paper proposes a more effective DFW with amplitude scaling (DFWA) that functions on the acoustic class level and is independent of GMM-transformation. The amplitude scaling compares average target and warped source log amplitude spectra for each class. DFWA outperforms the GMM in terms of both speech quality and timbre conversion, as confirmed in objective and subjective testing. Moreover, DFWA performance is equivalent using parallel or nonparallel corpora.

Full Paper

Bibliographic reference.  Godoy, Elizabeth / Rosec, Olivier / Chonavel, Thierry (2011): "Spectral envelope transformation using DFW and amplitude scaling for voice conversion with parallel or nonparallel corpora", In INTERSPEECH-2011, 673-676.