INTERSPEECH 2006 - ICSLP
This paper shows an effective speech/non-speech discrimination method for improving the performance of speech processing systems working in noisy environment. The proposed method uses a trained support vector machine (SVM) that defines an optimized non-linear decision rule over different sets of speech features. Two alternative feature extraction processes based on: i) subband SNR estimation after denoising, and ii) long-term SNR estimation were compared. Both methods show the ability of the SVM-based classifier to learn how the signal is masked by the acoustic noise and to define an effective non-linear decision rule. However, it is shown that a feature vector incorporating contextual information yielded better speech/non-speech discrimination even when no denoising is applied. The experimental analysis carried out on the Spanish SpeechDat-Car database shows clear improvements over standard VADs including ITU G.729, ETSI AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs.
Bibliographic reference. Ramírez, Javier / Yélamos, Pablo / Górriz, J. M. / Segura, José C. / García, L. (2006): "Speech/non-speech discrimination combining advanced feature extraction and SVM learning", In INTERSPEECH-2006, paper 1134-Wed1FoP.3.