In this paper, we derive new data-driven temporal filters that employ the statistics of the modulation spectra of the speech features. The new temporal filtering approaches are based on the constrained version of Principal Component Analysis (C-PCA) and Maximum Class Distance (C-MCD), respectively. It is shown that the proposed C-PCA and C-MCD temporal filters can effectively improve the speech recognition accuracy in various noise corrupted environments. In experiments conducted on Test Set A of the Aurora-2 noisy digits database, these new temporal filters, together with cepstral mean and variance normalization (CMVN), provides average relative error reduction rates of over 40% and 27%, when compared with the baseline MFCC processing and CMVN alone, respectively.
Bibliographic reference. Hung, Jeih-weih (2007): "Optimization of temporal filters in the modulation frequency domain for constructing robust features in speech recognition", In INTERSPEECH-2007, 1090-1093.