ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition

S. Umesh, Richard C. Rose, S. Parthasarathy

An experimental study of the application of scale-transform to improve the performance of speaker independent continuous speech recognition, is presented in this paper. Three major results are described. First, a comparison was made between the scale-transform based magnitude cepstrum coeƆcients (STCC) and mel-scale filter bank cepstrum coefficients (MFCC) on a telephone based connected digit recognition task. It was shown that the STCC can obtain a performance that is close to that of the MFCC. Second, a simple frequency-normalization procedure was applied to the scale-transform representation that improved performance on the connected digit recognition task with respect to theMFCC. Finally, in a more controlled experimental setting using the TIMIT database, it was shown that the application of phone-specific frequency warpings improved phone classification performance over using a single speaker-specific warping. This last result may have general implications for all frequency warping based speaker normalization procedures.


Cite as: Umesh, S., Rose, R.C., Parthasarathy, S. (2000) Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 301-304

@inproceedings{umesh00_icslp,
  author={S. Umesh and Richard C. Rose and S. Parthasarathy},
  title={{Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 1, 301-304}
}