5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

On the Interplay between Auditory-Based Features and Locally Recurrent Neural Networks for Robust Speech Recognition in Noise

Jurgen Tchorz (1), Klaus Kasper (2), Herbert Reininger (2), Bilger Kollmeier (1)

(1) Carl von Ossietzky-Universitšt, AG Medizinische Physik, Oldenburg, Germany (2) Institut fur Angewandte Physik, Johann Wolfgang-Goethe-Universitat, Frankfurt, Germany

The combination of a model of auditory perception (PEMO) as feature extractor and of a Locally Recurrent Neural Network (LRNN) as classifier yields promising ASR results in noise. Our study focuses on the interplay between both techniques and their ability to complement each other in the task of robust speech recognition. We performed recognition experiments with modifications of PEMO processing concerning amplitude compression and envelope modulation filtering. The results show that the distinct and sparse peaks of PEMO speech representation which are well maintained in noise are sufficient cues for LRNN-based recognition due to LRNN's ability to exploit information which is distributed over time. Enhanced envelope modulation bandpass filtering of PEMO feature vectors better reflects the average modulation spectrum of speech and further decreases the influence of noise.

Full Paper

Bibliographic reference.  Tchorz, Jurgen / Kasper, Klaus / Reininger, Herbert / Kollmeier, Bilger (1997): "On the interplay between auditory-based features and locally recurrent neural networks for robust speech recognition in noise", In EUROSPEECH-1997, 2075-2078.