5th International Conference on Spoken Language Processing
In this study, the performance of an auditory-model feature-extraction 'front end' was assessed in an isolated-word speech recognition task using a common hidden Markov model (HMM) 'back end', and compared with the performance of other feature representation front-end methods including mel-frequency cepstral coefficients (MFCC) and two variants (J- and L-) of the relative spectral amplitude (RASTA) technique. The recognition task was performed in the presence of varying levels and types of additive noise and spectral distortion using standard HMM whole-word models with the Bellcore Digit database as a corpus. While all front ends achieved comparable recognition performance in clean speech, the performance of the auditory-model front end was generally significantly higher than other methods in recognition tasks involving background noise or spectral distortion. Training HMMs with speech processed by the auditory-model or L-RASTA front end in one type of noise also improved the recognition performance with other kinds of noise. This 'cross-training' effect did not occur with the MFCC or J-RASTA front end.
Bibliographic reference. Hunke, Martin / Hyun, Meeran / Love, Steve / Holton, Thomas (1998): "Improving the noise and spectral robustness of an isolated-word recognizer using an auditory-model front end", In ICSLP-1998, paper 0715.