Current understanding of speech processing in the brain suggests dual streams of processing of temporal and spectral information, whereby slow vs. fast modulations are analyzed along parallel paths that encode various scales of information in speech signals. This unique way for the biology to analyze the multiplicity of information in speech signals along parallel paths can bare great lessons for feature extraction front-ends in speech processing systems, particularly for dealing with extrinsic degradations and unseen noise distortions. Here, we propose a multistream approach to feature analysis for robust speaker-independent phoneme recognition in presence of nonstationary background noises. The scheme presented here centers around a multi-path bandpass modulation analysis of speech sounds with each stream covering an entire range of temporal and spectral modulations. By performing bandpass operations of slow vs. fast information along the spectral and temporal dimensions, the proposed scheme avoids the classic feature explosion problem of previous multistream approaches while maintaining the advantage of parallelism and localized feature analysis. The proposed architecture results in substantial improvements over standard baseline features and two state-of-the-art noise robust feature schemes.
Bibliographic reference. Nemala, Sridhar Krishna / Patil, Kailash / Elhilali, Mounya (2011): "Multistream bandpass modulation features for robust speech recognition", In INTERSPEECH-2011, 1277-1280.