INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

The Value of Auditory Offset Adaptation and Appropriate Acoustic Modeling

Huan Wang (1), David Gelbart (2), Hans-Günter Hirsch (3), Werner Hemmert (4)

(1) Infineon Technologies AG, Germany; (2) ICSI, USA; (3) Hochschule Niederrhein, Germany; (4) Technische Universität München, Germany

A critical step in encoding sound for neuronal processing occurs when the analog pressure wave is coded into discrete nerve-action potentials. Recent pool models of the inner hair cell synapse do not reproduce the dead time period after an intense stimulus, so we used visual inspection and automatic speech recognition (ASR) to investigate an offset adaptation (OA) model proposed by Zhang et al. [1].

OA improved phase locking in the auditory nerve (AN) and raised ASR accuracy for features derived from AN fibers (ANFs). We also found that OA is crucial for auditory processing by onset neurons (ONs) in the next neuronal stage, the auditory brainstem. Multi-layer perceptrons (MLPs) performed much better than standard Gaussian mixture models (GMMs) for both our ANF-based and ON-based auditory features. Similar results were previously obtained with MSG (Modulation-filtered SpectroGram) auditory features[2]. Thus we believe researchers working with novel features should consider trying MLPs.

References

  1. X. Zhang and L. H. Carney, "Analysis of models for the synapse between the inner hair cell and the auditory nerve," J. Acoust. Soc. Am., vol. 118, pp. 1540-53, 2005.
  2. S. Sharma, D. Ellis, S. Kajarekar, P. Jain, and H. Hermansky, "Feature extraction using non-linear transformation for robust speech recognition on the Aurora database," in ICASSP, 2000.

Full Paper

Bibliographic reference.  Wang, Huan / Gelbart, David / Hirsch, Hans-Günter / Hemmert, Werner (2008): "The value of auditory offset adaptation and appropriate acoustic modeling", In INTERSPEECH-2008, 902-905.