ISCA Archive ICSLP 1994
ISCA Archive ICSLP 1994

The auditory image model as a preprocessor for spoken language

Roy D. Patterson, Timothy R. Anderson, Michael Allerhand

In the auditory system, the primary fibres that encode the mechanical motion of the basilar partition are phase locked to that motion, and auditory processing in the mid-brain preserves this information, to varying degrees, up to the level of the inferior colliculus. We know that this timing information is used in the localisation of point sources [1] and it is probably also used to separate point sources from more diffuse background noise. The time intervals in these neural patterns are on the order of milliseconds and so traditional speech preprocessors (like MCC and MFCC systems), with frames on the order of 15 milliseconds, remove the time-interval information from the representation. The performance of these systems deteriorates badly when the speaker is in a noisy environment with competing sources. This suggests that we will eventually need to incorporate time-interval processing into speech recognition systems if we are to achieve the kind of noise resistance characterisitic of human speech recognition. In this paper, we describe a) an auditory model designed to stabilise repeating time-interval patterns, b) the 'data-rate problem' associated with auditory models as speech preprocessors, c) a strategy for developing a noise resistant auditory spectrogram for speech recognition, and d) recent recognition results with a monaural auditory spectrogram.


Cite as: Patterson, R.D., Anderson, T.R., Allerhand, M. (1994) The auditory image model as a preprocessor for spoken language. Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994), 1395-1398

@inproceedings{patterson94b_icslp,
  author={Roy D. Patterson and Timothy R. Anderson and Michael Allerhand},
  title={{The auditory image model as a preprocessor for spoken language}},
  year=1994,
  booktitle={Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994)},
  pages={1395--1398}
}