September 22-25, 1997
The combination of a model of auditory perception (PEMO) as feature extractor and of a Locally Recurrent Neural Network (LRNN) as classifier yields promising ASR results in noise. Our study focuses on the interplay between both techniques and their ability to complement each other in the task of robust speech recognition. We performed recognition experiments with modifications of PEMO processing concerning amplitude compression and envelope modulation filtering. The results show that the distinct and sparse peaks of PEMO speech representation which are well maintained in noise are sufficient cues for LRNN-based recognition due to LRNN's ability to exploit information which is distributed over time. Enhanced envelope modulation bandpass filtering of PEMO feature vectors better reflects the average modulation spectrum of speech and further decreases the influence of noise.
Bibliographic reference. Tchorz, Jurgen / Kasper, Klaus / Reininger, Herbert / Kollmeier, Bilger (1997): "On the interplay between auditory-based features and locally recurrent neural networks for robust speech recognition in noise", In EUROSPEECH-1997, 2075-2078.