15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Towards Improving Statistical Model Based Voice Activity Detection

Ming Tu, Xiang Xie, Yishan Jiao

BIT, China

Statistical model based voice activity detection (VAD) is commonly used in various speech related research and applications. In this paper, we try to improve the performance of statistical model based VAD via new feature extraction method. Our main innovation focuses on that we apply Mel-frequency subband coefficients with power-law nonlinearity as feature for statistical model based VAD instead of Discrete Fourier Transform (DFT) coefficients. This proposed feature is then modeled by Gaussian distribution. Performances of this method are comprehensively compared with existing methods. Meanwhile we also test power-law nonlinearity on existing methods. Experimental results prove that with proposed subband coefficients the performance of statistical model based VAD could be improved a lot. Power-law nonlinearity on DFT coefficients could also bring some improvement.

Full Paper

Bibliographic reference.  Tu, Ming / Xie, Xiang / Jiao, Yishan (2014): "Towards improving statistical model based voice activity detection", In INTERSPEECH-2014, 1549-1552.