 |
Sixth International Conference on Spoken Language Processing (ICSLP 2000)
Beijing, China
October 16-20, 2000 |
 |
Exploiting Frequency-Scaling Invariance Properties of the Scale Transform for Automatic Speech Recognition
S. Umesh (1), Richard C. Rose (2), S. Parthasarathy (1)
(1) Indian Institute of Technology, Kanpur, India
(2) AT&T Labs - Research, Florham Park, NJ, USA
An experimental study of the application of
scale-transform to improve the performance of speaker
independent continuous speech recognition, is presented in
this paper. Three major results are described. First, a
comparison was made between the scale-transform based
magnitude cepstrum coeÆcients (STCC) and mel-scale
filter bank cepstrum coefficients (MFCC) on a telephone
based connected digit recognition task. It was shown that
the STCC can obtain a performance that is close to that of
the MFCC. Second, a simple frequency-normalization
procedure was applied to the scale-transform representation
that improved performance on the connected digit recognition
task with respect to theMFCC. Finally, in a more
controlled experimental setting using the TIMIT database, it
was shown that the application of phone-specific frequency
warpings improved phone classification performance over
using a single speaker-specific warping. This last result
may have general implications for all frequency warping
based speaker normalization procedures.
Full Paper
Bibliographic reference.
Umesh, S. / Rose, Richard C. / Parthasarathy, S. (2000):
"Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition",
In ICSLP-2000, vol.1, 301-304.