Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Exploiting Frequency-Scaling Invariance Properties of the Scale Transform for Automatic Speech Recognition

S. Umesh (1), Richard C. Rose (2), S. Parthasarathy (1)

(1) Indian Institute of Technology, Kanpur, India
(2) AT&T Labs - Research, Florham Park, NJ, USA

An experimental study of the application of scale-transform to improve the performance of speaker independent continuous speech recognition, is presented in this paper. Three major results are described. First, a comparison was made between the scale-transform based magnitude cepstrum coe∆cients (STCC) and mel-scale filter bank cepstrum coefficients (MFCC) on a telephone based connected digit recognition task. It was shown that the STCC can obtain a performance that is close to that of the MFCC. Second, a simple frequency-normalization procedure was applied to the scale-transform representation that improved performance on the connected digit recognition task with respect to theMFCC. Finally, in a more controlled experimental setting using the TIMIT database, it was shown that the application of phone-specific frequency warpings improved phone classification performance over using a single speaker-specific warping. This last result may have general implications for all frequency warping based speaker normalization procedures.

Full Paper

Bibliographic reference.  Umesh, S. / Rose, Richard C. / Parthasarathy, S. (2000): "Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition", In ICSLP-2000, vol.1, 301-304.