5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Normalization of Speaker Variability by Spectrum Warping for Robust Speech Recognition

Y.C. Chu, Charlie Jie, Vincent Tung, Ben Lin, Richard Lee

Technology Center Philips Taiwan, Taipei, Taiwan

This paper examines techniques for normalization of unseen speakers in recognition. Two implementations of linear spectrum warping were examined: time domain resampling and filter bank scaling. It is shown that for seen speakers, the models trained by unwarped utterances are less sensitive to spectrum warping by filter bank scaling than by resampling. A pitch-based scheme for warping factor estimation has been proposed. The method is shown to be cost-effective in reducing the variability of unseen speakers compared to the ML-based methods. In particular the combination of filter bank scaling with the pitch- based warping factor estimation reduces the error rate of isolated Mandarin digit recognition by more than 30% for unseen speakers.

Full Paper

Bibliographic reference.  Chu, Y.C. / Jie, Charlie / Tung, Vincent / Lin, Ben / Lee, Richard (1997): "Normalization of speaker variability by spectrum warping for robust speech recognition", In EUROSPEECH-1997, 1127-1130.