INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Comparing Different Flavors of Spectro-Temporal Features for ASR

Bernd T. Meyer (1), Suman V. Ravuri (1), Marc René Schädler (2), Nelson Morgan (1)

(1) ICSI, USA
(2) Carl von Ossietzky Universität Oldenburg, Germany

In the last decade, several studies have shown that the robustness of ASR systems can be increased when 2D Gabor filters are used to extract specific modulation frequencies from the input pattern. This paper analyzes important design parameters for spectro-temporal features based on a Gabor filter bank: We perform experiments with filters that exhibit different phase sensitivity. Further, we analyze if non-linear weighting with a multi-layer perceptron (MLP) and a subsequent concatenation with mel-frequency cepstral coefficients (MFCCs) has beneficial effects. For the Aurora2 noisy digit recognition task, the use of phase sensitive filters improved the MFCC baseline, whereas using filters that neglect phase information did not. While MLP processing alone did not have a large effect on the overall performance, the best results were obtained for MLP-processed phase sensitive filters and added MFCCs, with relative error reductions of over 40% for both noisy and clean training.

Full Paper

Bibliographic reference.  Meyer, Bernd T. / Ravuri, Suman V. / Schädler, Marc René / Morgan, Nelson (2011): "Comparing different flavors of spectro-temporal features for ASR", In INTERSPEECH-2011, 1269-1272.