12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Modulation Spectrum Analysis for Recognition of Reverberant Speech

Sri Harish Mallidi, Sriram Ganapathy, Hynek Hermansky

Johns Hopkins University, USA

Recognition of reverberant speech constitutes a challenging problem for typical speech recognition systems. This is mainly due to the conventional short-term analysis/compensation techniques. In this paper, we present a feature extraction technique based on modeling long segments of temporal envelopes of the speech signal in narrow sub-bands using frequency domain linear prediction (FDLP). FDLP provides an all-pole approximation of the Hilbert envelope of the signal by linear prediction on cosine transform of the signal. We show that the FDLP modulation spectrum plays an important role in the robustness of the proposed feature extraction. Automatic speech recognition (ASR) experiments on speech data degraded with a number of room impulse responses (with varying degrees of distortion) show significant performance improvements for the proposed FDLP features when compared to other robust feature extraction techniques (average relative reduction of 40% in word error rate). Similar improvements are also obtained for far-field data which contain natural reverberation in background noise.

Full Paper

Bibliographic reference.  Mallidi, Sri Harish / Ganapathy, Sriram / Hermansky, Hynek (2011): "Modulation spectrum analysis for recognition of reverberant speech", In INTERSPEECH-2011, 189-192.