INTERSPEECH 2006 - ICSLP
This paper addresses the problem of segmenting audio data recorded with embedded devices for the purpose of intelligent sensing in the context of multi-modal interactions. We propose a real-time method for robust speech detection in natural, noisy environments. It is based on a fusion of high order statistics of the LPC residual and autocorrelation, and adopts an on-line version of Expectation Maximization algorithm for the classification. Experimental evaluations show that the proposed method provides better detection performance under different types of natural noises, working robustly against other voices in the context of multi-speaker interactive situations. As the proposed method is based on features which have a low computational cost, and has a small latency, it is suitable for real-time tracking applications.
Bibliographic reference. Cournapeau, David / Kawahara, Tatsuya / Mase, Kenji / Toriyama, Tomoji (2006): "Voice activity detector based on enhanced cumulant of LPC residual and on-line EM algorithm", In INTERSPEECH-2006, paper 1375-Tue3A1O.1.