12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Real-World Speech/Non-Speech Audio Classification Based on Sparse Representation Features and GPCs

Ziqiang Shi, Jiqing Han, Tieran Zheng

Harbin Institute of Technology, China

A novel and robust approach for content based speech/non-speech audio classification is proposed based on sparse representation (SR) features and Gaussian process classifiers (GPCs). The projections of the noise robust sparse representations for audio signals computed by L1-norm minimization are used as features. GPCs are used to learn and predict audio categories. Compare to the difficulties of Support Vector Machines (SVMs) in determining the hyperparameters, GPCs employ Bayesian selection criterion to estimate them. Experimental results on real-world audio datasets show that the SR features are more robust to audio variants than mel-frequency cepstral coefficients (MFCCs) and the proposed approach gives better performances than SVM.

Full Paper

Bibliographic reference.  Shi, Ziqiang / Han, Jiqing / Zheng, Tieran (2011): "Real-world speech/non-speech audio classification based on sparse representation features and GPCs", In INTERSPEECH-2011, 2401-2404.