September 22-25, 1997
To overcome the problems related with the long impulse responses produced by reverberation, we use a long time window (high frequency resolution) analysis during the channel normalization steps of the feature extraction process in automatic speech recognition (ASR). After normalization, a trade between frequency and time resolution is used to increase the rate at which the time information is sampled (short-time domain), yielding an appropriate domain to derive ASR features. Experiments on data with reverberation times of about 0.5 s show that the new technique achieves significant performance improvement of a speech recognizer under reverberation, with only some performance degradation on clean speech.
Bibliographic reference. Avendano, Carlos / Tibrewala, Sangita / Hermansky, Hynek (1997): "Multiresolution channel normalization for ASR in reverberant environments", In EUROSPEECH-1997, 1107-1110.