An experimental study of the application of scale-transform to improve the performance of speaker independent continuous speech recognition, is presented in this paper. Three major results are described. First, a comparison was made between the scale-transform based magnitude cepstrum coeĆcients (STCC) and mel-scale filter bank cepstrum coefficients (MFCC) on a telephone based connected digit recognition task. It was shown that the STCC can obtain a performance that is close to that of the MFCC. Second, a simple frequency-normalization procedure was applied to the scale-transform representation that improved performance on the connected digit recognition task with respect to theMFCC. Finally, in a more controlled experimental setting using the TIMIT database, it was shown that the application of phone-specific frequency warpings improved phone classification performance over using a single speaker-specific warping. This last result may have general implications for all frequency warping based speaker normalization procedures.
Cite as: Umesh, S., Rose, R.C., Parthasarathy, S. (2000) Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 301-304, doi: 10.21437/ICSLP.2000-75
@inproceedings{umesh00_icslp, author={S. Umesh and Richard C. Rose and S. Parthasarathy}, title={{Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 1, 301-304}, doi={10.21437/ICSLP.2000-75} }