Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

The Auditory Image Model as a Preprocessor for Spoken Language

Roy D. Patterson (1), Timothy R. Anderson (2), Michael Allerhand (1)

(1) Medical Research Council, Applied Psychology Unit, Cambridge, UK
(2) Armstrong Laboratory, Bioacoustics and Biocommunications Branch, Wright-Patterson AFB, OH, USA

In the auditory system, the primary fibres that encode the mechanical motion of the basilar partition are phase locked to that motion, and auditory processing in the mid-brain preserves this information, to varying degrees, up to the level of the inferior colliculus. We know that this timing information is used in the localisation of point sources [1] and it is probably also used to separate point sources from more diffuse background noise. The time intervals in these neural patterns are on the order of milliseconds and so traditional speech preprocessors (like MCC and MFCC systems), with frames on the order of 15 milliseconds, remove the time-interval information from the representation. The performance of these systems deteriorates badly when the speaker is in a noisy environment with competing sources. This suggests that we will eventually need to incorporate time-interval processing into speech recognition systems if we are to achieve the kind of noise resistance characterisitic of human speech recognition. In this paper, we describe a) an auditory model designed to stabilise repeating time-interval patterns, b) the 'data-rate problem' associated with auditory models as speech preprocessors, c) a strategy for developing a noise resistant auditory spectrogram for speech recognition, and d) recent recognition results with a monaural auditory spectrogram.

Full Paper

Bibliographic reference.  Patterson, Roy D. / Anderson, Timothy R. / Allerhand, Michael (1994): "The auditory image model as a preprocessor for spoken language", In ICSLP-1994, 1395-1398.