EUROSPEECH 2003 - INTERSPEECH 2003
In the context of automatic speech recognition, the popular Mel Frequency Cepstral Coefficients(MFCC) as features, though perform very well under clean and matched environments, are observed to fail in mismatched conditions. The spectral maxima are often observed to preserve their locations and energies under noisy environments, but are not presented explicitly by the MFCC features. This paper presents a framework for representing the maxima information for robust recognition in the presence of additive White Gaussian Noise(WGN). For the task of phoneme based Isolated Word Recognition (IWR) under different Signal to Noise Ratio (SNR) environments, the results show an improved recognition performance. The cepstral features are computed from a reconstructed spectrogram by fitting gaussians around the spectral maxima. In view of the inherent robustness and easy trackability of the maxima, this opens up interesting avenues towards a robust feature representation as well as preprocessing techniques.
Bibliographic reference. Sujatha, J. / Kumar, K.R. Prasanna / Ramakrishnan, K.R. / Balakrishnan, N. (2003): "Spectral maxima representation for robust automatic speech recognition", In EUROSPEECH-2003, 3077-3080.