8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Optimization of Temporal Filters in the Modulation Frequency Domain for Constructing Robust Features in Speech Recognition

Jeih-weih Hung

National Chi Nan University, Taiwan

In this paper, we derive new data-driven temporal filters that employ the statistics of the modulation spectra of the speech features. The new temporal filtering approaches are based on the constrained version of Principal Component Analysis (C-PCA) and Maximum Class Distance (C-MCD), respectively. It is shown that the proposed C-PCA and C-MCD temporal filters can effectively improve the speech recognition accuracy in various noise corrupted environments. In experiments conducted on Test Set A of the Aurora-2 noisy digits database, these new temporal filters, together with cepstral mean and variance normalization (CMVN), provides average relative error reduction rates of over 40% and 27%, when compared with the baseline MFCC processing and CMVN alone, respectively.

Full Paper

Bibliographic reference.  Hung, Jeih-weih (2007): "Optimization of temporal filters in the modulation frequency domain for constructing robust features in speech recognition", In INTERSPEECH-2007, 1090-1093.