Voice activity detection (VAD) is an important frontend of many speech processing systems. In this paper, we describe a new VAD algorithm based on boosted deep neural networks (bDNNs). The proposed algorithm first generates multiple base predictions for a single frame from only one DNN and then aggregates the base predictions for a better prediction of the frame. Moreover, we employ a new acoustic feature, multi-resolution cochleagram (MRCG), that concatenates the cochleagram features at multiple spectrotemporal resolutions and shows superior speech separation results over many acoustic features. Experimental results show that bDNN-based VAD with the MRCG feature outperforms state-of-the-art VADs by a considerable margin.
Bibliographic reference. Zhang, Xiao-Lei / Wang, DeLiang (2014): "Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection", In INTERSPEECH-2014, 1534-1538.