12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Multistream Bandpass Modulation Features for Robust Speech Recognition

Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali

Johns Hopkins University, USA

Current understanding of speech processing in the brain suggests dual streams of processing of temporal and spectral information, whereby slow vs. fast modulations are analyzed along parallel paths that encode various scales of information in speech signals. This unique way for the biology to analyze the multiplicity of information in speech signals along parallel paths can bare great lessons for feature extraction front-ends in speech processing systems, particularly for dealing with extrinsic degradations and unseen noise distortions. Here, we propose a multistream approach to feature analysis for robust speaker-independent phoneme recognition in presence of nonstationary background noises. The scheme presented here centers around a multi-path bandpass modulation analysis of speech sounds with each stream covering an entire range of temporal and spectral modulations. By performing bandpass operations of slow vs. fast information along the spectral and temporal dimensions, the proposed scheme avoids the classic feature explosion problem of previous multistream approaches while maintaining the advantage of parallelism and localized feature analysis. The proposed architecture results in substantial improvements over standard baseline features and two state-of-the-art noise robust feature schemes.

Full Paper

Bibliographic reference.  Nemala, Sridhar Krishna / Patil, Kailash / Elhilali, Mounya (2011): "Multistream bandpass modulation features for robust speech recognition", In INTERSPEECH-2011, 1277-1280.