September 22-25, 1997
This paper examines techniques for normalization of unseen speakers in recognition. Two implementations of linear spectrum warping were examined: time domain resampling and filter bank scaling. It is shown that for seen speakers, the models trained by unwarped utterances are less sensitive to spectrum warping by filter bank scaling than by resampling. A pitch-based scheme for warping factor estimation has been proposed. The method is shown to be cost-effective in reducing the variability of unseen speakers compared to the ML-based methods. In particular the combination of filter bank scaling with the pitch- based warping factor estimation reduces the error rate of isolated Mandarin digit recognition by more than 30% for unseen speakers.
Bibliographic reference. Chu, Y.C. / Jie, Charlie / Tung, Vincent / Lin, Ben / Lee, Richard (1997): "Normalization of speaker variability by spectrum warping for robust speech recognition", In EUROSPEECH-1997, 1127-1130.