15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Boosted Deep Neural Networks and Multi-Resolution Cochleagram Features for Voice Activity Detection

Xiao-Lei Zhang (1), DeLiang Wang (2)

(1) Tsinghua University, China
(2) Ohio State University, USA

Voice activity detection (VAD) is an important frontend of many speech processing systems. In this paper, we describe a new VAD algorithm based on boosted deep neural networks (bDNNs). The proposed algorithm first generates multiple base predictions for a single frame from only one DNN and then aggregates the base predictions for a better prediction of the frame. Moreover, we employ a new acoustic feature, multi-resolution cochleagram (MRCG), that concatenates the cochleagram features at multiple spectrotemporal resolutions and shows superior speech separation results over many acoustic features. Experimental results show that bDNN-based VAD with the MRCG feature outperforms state-of-the-art VADs by a considerable margin.

Full Paper

Bibliographic reference.  Zhang, Xiao-Lei / Wang, DeLiang (2014): "Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection", In INTERSPEECH-2014, 1534-1538.