Third International Conference on Spoken Language Processing (ICSLP 94)
In the auditory system, the primary fibres that encode the mechanical motion of the basilar partition are phase locked to that motion, and auditory processing in the mid-brain preserves this information, to varying degrees, up to the level of the inferior colliculus. We know that this timing information is used in the localisation of point sources  and it is probably also used to separate point sources from more diffuse background noise. The time intervals in these neural patterns are on the order of milliseconds and so traditional speech preprocessors (like MCC and MFCC systems), with frames on the order of 15 milliseconds, remove the time-interval information from the representation. The performance of these systems deteriorates badly when the speaker is in a noisy environment with competing sources. This suggests that we will eventually need to incorporate time-interval processing into speech recognition systems if we are to achieve the kind of noise resistance characterisitic of human speech recognition. In this paper, we describe a) an auditory model designed to stabilise repeating time-interval patterns, b) the 'data-rate problem' associated with auditory models as speech preprocessors, c) a strategy for developing a noise resistant auditory spectrogram for speech recognition, and d) recent recognition results with a monaural auditory spectrogram.
Bibliographic reference. Patterson, Roy D. / Anderson, Timothy R. / Allerhand, Michael (1994): "The auditory image model as a preprocessor for spoken language", In ICSLP-1994, 1395-1398.