9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Front-End for Far-Field Speech Recognition Based on Frequency Domain Linear Prediction

Sriram Ganapathy, Samuel Thomas, Hynek Hermansky

IDIAP Research Institute, Switzerland

Automatic Speech Recognition (ASR) systems usually fail when they encounter speech from far-field microphone in reverberant environments. This is due to the application of short-term feature extraction techniques which do not compensate for the artifacts introduced by long room impulse responses. In this paper, we propose a front-end, based on Frequency Domain Linear Prediction (FDLP), that tries to remove reverberation artifacts present in far-field speech. Long temporal segments of far-field speech are analyzed in narrow frequency sub-bands to extract FDLP envelopes and residual signals. Filtering the residual signals with gain normalized inverse FDLP filters result in a set of sub-band signals which are synthesized to reconstruct the signal back. ASR experiments on far-field speech data processed by the proposed front-end show significant improvements (relative reduction of 30% in word error rate) compared to other robust feature extraction techniques.

Full Paper

Bibliographic reference.  Ganapathy, Sriram / Thomas, Samuel / Hermansky, Hynek (2008): "Front-end for far-field speech recognition based on frequency domain linear prediction", In INTERSPEECH-2008, 984-987.