In this study, the performance of an auditory-model feature-extraction 'front end' was assessed in an isolated-word speech recognition task using a common hidden Markov model (HMM) 'back end', and compared with the performance of other feature representation front-end methods including mel-frequency cepstral coefficients (MFCC) and two variants (J- and L-) of the relative spectral amplitude (RASTA) technique. The recognition task was performed in the presence of varying levels and types of additive noise and spectral distortion using standard HMM whole-word models with the Bellcore Digit database as a corpus. While all front ends achieved comparable recognition performance in clean speech, the performance of the auditory-model front end was generally significantly higher than other methods in recognition tasks involving background noise or spectral distortion. Training HMMs with speech processed by the auditory-model or L-RASTA front end in one type of noise also improved the recognition performance with other kinds of noise. This 'cross-training' effect did not occur with the MFCC or J-RASTA front end.
Cite as: Hunke, M., Hyun, M., Love, S., Holton, T. (1998) Improving the noise and spectral robustness of an isolated-word recognizer using an auditory-model front end. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0715, doi: 10.21437/ICSLP.1998-309
@inproceedings{hunke98_icslp, author={Martin Hunke and Meeran Hyun and Steve Love and Thomas Holton}, title={{Improving the noise and spectral robustness of an isolated-word recognizer using an auditory-model front end}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0715}, doi={10.21437/ICSLP.1998-309} }