11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Voice Activity Detection Using Frame-Wise Model Re-Estimation Method Based on Gaussian Pruning with Weight Normalization

Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani

NTT Corporation, Japan

This paper proposes a model re-estimation method based on Gaussian pruning for noise robust voice activity detection (VAD). Our previous work, switching Kalman filter-based VAD, sequentially estimates a noise Gaussian mixture model (GMM) and constructs GMMs of observed noisy speech signals by composing pre-trained silence and clean GMMs and noise GMMs. However, the composed models are not optimum models, because they do not reflect the characteristics of the observed signal. To ensure the optimality of the composed models, we investigate a method for re-estimating the composed model. However, since our VAD works under the sequential processing, there are insufficient re-training data for a satisfactory model re-estimation. Thus, we propose a model re-estimation method that involves the extraction of beneficial information using Gaussian pruning. The proposed method re-estimates the model by pruning non-dominant Gaussian distributions for each frame, and improves VAD accuracy in noise.

Full Paper

Bibliographic reference.  Fujimoto, Masakiyo / Watanabe, Shinji / Nakatani, Tomohiro (2010): "Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization", In INTERSPEECH-2010, 3102-3105.