12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Minimum Classification Error Based Spectro-Temporal Feature Extraction for Robust Audio Classification

Yuan-Fu Liao, Chia-Hsing Lin, We-Der Fang

National Taipei University of Technology, Taiwan

Mel-frequency cepstral coefficients (MFCCs) are the most popular features for automatic audio classification (AAC). However, MFCCs are often not robust in adverse environment. In this paper, a minimum classification error (MCE)-based method is proposed to extract new and robust spectro-temporal features as alternatives to MFCCs. The robustness of the proposed new features is evaluated on noisy non-speech sound of RWCP Sound Scene Database in Real Acoustic Environment database with Aurora 2 multi-condition training task-like settings. Experimental results show the proposed new features achieved the lowest average recognition error rate of 3.17% which is much better than state-of-the-art MFCCs plus mean subtraction, variance normalization and ARMA filtering (MFCC+MVA, 4.31%), Gabor filters with principle component analysis (Gabor+PCA, 4.43%) and linear discriminant analysis (LDA, 4.20%) features. We thus confirm the robustness of the proposed spectro-temporal feature extraction approach.

Full Paper

Bibliographic reference.  Liao, Yuan-Fu / Lin, Chia-Hsing / Fang, We-Der (2011): "Minimum classification error based spectro-temporal feature extraction for robust audio classification", In INTERSPEECH-2011, 241-244.