This paper shows an effective speech/non-speech discrimination method for improving the performance of speech processing systems working in noisy environment. The proposed method uses a trained support vector machine (SVM) that defines an optimized non-linear decision rule over different sets of speech features. Two alternative feature extraction processes based on: i) subband SNR estimation after denoising, and ii) long-term SNR estimation were compared. Both methods show the ability of the SVM-based classifier to learn how the signal is masked by the acoustic noise and to define an effective non-linear decision rule. However, it is shown that a feature vector incorporating contextual information yielded better speech/non-speech discrimination even when no denoising is applied. The experimental analysis carried out on the Spanish SpeechDat-Car database shows clear improvements over standard VADs including ITU G.729, ETSI AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs.
Cite as: Ramírez, J., Yélamos, P., Górriz, J.M., Segura, J.C., García, L. (2006) Speech/non-speech discrimination combining advanced feature extraction and SVM learning. Proc. Interspeech 2006, paper 1134-Wed1FoP.3, doi: 10.21437/Interspeech.2006-463
@inproceedings{ramirez06_interspeech, author={Javier Ramírez and Pablo Yélamos and J. M. Górriz and José C. Segura and L. García}, title={{Speech/non-speech discrimination combining advanced feature extraction and SVM learning}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1134-Wed1FoP.3}, doi={10.21437/Interspeech.2006-463} }