13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Hooking Up Spectro-temporal Filters with Auditory-inspired Representations for Robust Automatic Speech Recognition

Bernd T. Meyer (1,2), Constantin Spille (1), Birger Kollmeier (1), Nelson Morgan (2,3)

(1) Medical Physics, Carl-von-Ossietzky Universität Oldenburg, Germany
(2) International Computer Science Institute, Berkeley, CA, USA
(3) EECS Department, University of California - Berkeley, Berkeley, CA, USA

Spectro-temporal filtering has been shown to result in features that can help to increase the robustness of automatic speech recognition (ASR) in the past. We replace the spectro-temporal representation used in previous work with spectrograms that incorporate knowledge about the signal processing of the human auditory system and which are derived from Power-Normalized Cepstral Coefficients (PNCCs). 2D-Gabor filters are applied to these spectrograms to extract features evaluated on a noisy digit recognition task. The filter bank is adapted to the new representation by optimizing the spectral modulation frequencies associated with each Gabor function. A comparison of optimized parameters and the spectral modulation of vowels shows a good match between optimized and expected range of frequencies. When processed with a non-linear neural net and combined with PNCCs, Gabor features decrease the error rate compared to the baseline and PNCCs by at least 19%.

Index Terms: automatic speech recognition, spectrotemporal features, power-normalized features

Full Paper

Bibliographic reference.  Meyer, Bernd T. / Spille, Constantin / Kollmeier, Birger / Morgan, Nelson (2012): "Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition", In INTERSPEECH-2012, 1259-1262.