9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Introducing Temporal Asymmetries in Feature Extraction for Automatic Speech Recognition

G. S. V. S. Sivaram, Hynek Hermansky

IDIAP Research Institute, Switzerland

We propose a new auditory inspired feature extraction technique for automatic speech recognition (ASR). Features are extracted by filtering the temporal trajectory of spectral energies in each critical band of speech by a bank of finite impulse response (FIR) filters. Impulse responses of these filters are derived from a modified Gabor envelope in order to emulate asymmetries of the temporal receptive field (TRF) profiles observed in higher level auditory neurons. We obtain 11.4% relative improvement in word error rate on OGI-Digits database and, 3.2% relative improvement in phoneme error rate on TIMIT database over the MRASTA technique.

