10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Temporal Modulation Processing of Speech Signals for Noise Robust ASR

Hong You, Abeer Alwan

University of California at Los Angeles, USA

In this paper, we analyze the temporal modulation characteristics of speech and noise from a speech/non-speech discrimination point of view. Although previous psychoacoustic studies [3][10] have shown that low temporal modulation components are important for speech intelligibility, there is no reported analysis on modulation components from the point of view of speech/noise discrimination. Our data-driven analysis of modulation components of speech and noise reveals that speech and noise is more accurately classified by low-passed modulation frequencies than band-passed ones. Effects of additive noise on the modulation characteristics of speech signals are also analyzed. Based on the analysis, we propose a frequency adaptive modulation processing algorithm for a noise robust ASR task. The algorithm is based on speech channel classification and modulation pattern denoising. Speech recognition experiments are performed to compare the proposed algorithm with other noise robust frontends, including RASTA and ETSI AFE. Recognition results show that the frequency adaptive modulation processing is promising.

Full Paper

Bibliographic reference.  You, Hong / Alwan, Abeer (2009): "Temporal modulation processing of speech signals for noise robust ASR", In INTERSPEECH-2009, 36-39.