This paper presents a new method for vocal tract length (VTL) estimation and normalization based on a gammachirp auditory filterbank (GCFB) to improve the sound quality in voice morphing. VTL ratios between 28 speakers were estimated based on the spectral distances for all permutations (756 = 28P27) . The VTL estimation using the mel-frequency filterbank (MFFB), which is a preprocessor for calculating MFCCs commonly used in ASR, was also evaluated for comparison. The results of subjective listening tests of morphed voice sounds with and without VTL normalization are also reported. The objective and subjective results indicate that VTL normalization is essential for voice morphing, and the proposed GCFB-based method outperforms the MFCC-based method.
Bibliographic reference. Okamoto, Erika / Irino, Toshio / Nisimura, Ryuichi / Kawahara, Hideki (2011): "Auditory filterbank improves voice morphing", In INTERSPEECH-2011, 2517-2520.