Spectro-temporal Gabor features based on auditory knowledge have improved word accuracy for automatic speech recognition in the presence of noise. In previous work, we generated robust spectro-temporal features that incorporated the power normalized cepstral coefficient (PNCC) algorithm. The corresponding power normalized spectrum (PNS) is then processed by many Gabor filters, yielding a high dimensional feature vector. In tandem processing, an MLP with one hidden layer is often employed to learn discriminative transformations from front end features, in this case Gabor filtered power spectra, to probabilistic features, which are referred as PNS-Gabor MLP. Here we improve PNS-Gabor MLP in two ways. First, we select informative Gabor features using sparse principle component analysis (sparse PCA) before tandem processing. Second, we use a deep neural network (DNN) with bottleneck structure. Experiments show that the high-dimensional Gabor features are redundant. In our experiment, sparse principal component analysis suggests Gabor filters with longer time scales are particularly informative. The best of our experimental modifications gave an error rate reduction of 15.5% relative to PNS-Gabor MLP plus MFCC, and 41.4% better than an MFCC baseline on a large vocabulary continuous speech recognition task using noisy data.
Bibliographic reference. Chang, Shuo-Yiin / Morgan, Nelson (2013): "Informative spectro-temporal bottleneck features for noise-robust speech recognition", In INTERSPEECH-2013, 99-103.