12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Auditory Filterbank Improves Voice Morphing

Erika Okamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara

Wakayama University, Japan

This paper presents a new method for vocal tract length (VTL) estimation and normalization based on a gammachirp auditory filterbank (GCFB) to improve the sound quality in voice morphing. VTL ratios between 28 speakers were estimated based on the spectral distances for all permutations (756 = 28P27) . The VTL estimation using the mel-frequency filterbank (MFFB), which is a preprocessor for calculating MFCCs commonly used in ASR, was also evaluated for comparison. The results of subjective listening tests of morphed voice sounds with and without VTL normalization are also reported. The objective and subjective results indicate that VTL normalization is essential for voice morphing, and the proposed GCFB-based method outperforms the MFCC-based method.

Full Paper

Bibliographic reference.  Okamoto, Erika / Irino, Toshio / Nisimura, Ryuichi / Kawahara, Hideki (2011): "Auditory filterbank improves voice morphing", In INTERSPEECH-2011, 2517-2520.