5th International Conference on Spoken Language Processing
Part of the problems in noise robust speech recognition can be attributed to poor acoustic modeling and use of inappropriate features. It is known that the human auditory system is superior to the best speech recognizer currently available. Hence, in this paper, we propose a new two-stream feature extractor that incorporates some of the key functions of the peripheral auditory subsystem. To enhance noise robustness, the input is divided into low-pass and high-pass channels to form so-called static and dynamic streams. These two streams are independently processed and recombined to produce a single stream, containing 13 feature vector components, with improved linguistic information. Speaker-dependent isolated-word recognition tests, using the proposed front-end, produced an average 39% and 17% error rate reductions, over all noisy environments, as compared to the standard Mel Frequency Cepstral Coefficient (MFCC) front-ends with 13 (statics only) and 26 (statics and deltas) feature vector components, respectively.
Bibliographic reference. Tian, Jilei / Hariharan, Ramalingam / Laurila, Kari (1998): "Noise robust two-stream auditory feature extraction method for speech recognition", In ICSLP-1998, paper 0325.