ISCA Archive SAPA 2004
ISCA Archive SAPA 2004

Auditory-based automatic speech recognition

Werner Hemmert, Marcus Holmberg, David Gelbart

In this paper we develop a physiologically motivated model of peripheral auditory processing and evaluate how the different processing steps influence automatic speech recognition in noise. The model features large dynamic compression (>60 dB) and a realistic sensory cell model. The compression range was well matched to the limited dynamic range of the sensory cells and the model yielded surprisingly high recognition scores. We also developed a computationally efficient simplified model of auditory processing and found that a model of adaptation could improve recognition accuracy. Adaptation is a basic principle of neuronal processing, which accentuates signal onsets. Applying this adaptation model to melfrequency cepstral coefficient (MFCC) feature extraction enhanced recognition accuracy in noise (AURORA 2 task, averaged recognition scores) from 56.4% to 75.6% (clean training condition), a relative improvement of 41% in word error rate. Adaptation outperformed RASTA processing by more than 10%, which corresponds to a relative improvement of 31%.

Cite as: Hemmert, W., Holmberg, M., Gelbart, D. (2004) Auditory-based automatic speech recognition. Proc. ITRW on Statistical and Perceptual Audio Processing (SAPA 2004), paper 74

  author={Werner Hemmert and Marcus Holmberg and David Gelbart},
  title={{Auditory-based automatic speech recognition}},
  booktitle={Proc. ITRW on Statistical and Perceptual Audio Processing (SAPA 2004)},
  pages={paper 74}