12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Adaptive Regularization Framework for Robust Voice Activity Detection

Xugang Lu (1), Masashi Unoki (2), Ryosuke Isotani (1), Hisashi Kawai (1), Satoshi Nakamura (1)

(1) NICT, Japan
(2) JAIST, Japan

Traditional VAD algorithms work well under clean conditions, their performance however decreases drastically in noisy environments. We have investigated the tradeoff between false acceptance rate (FAR) and false rejection rate (FRR) in VAD with the consideration of noise reduction and speech distortion problem in speech enhancement, and proposed a regularization framework for noise reduction in designing VAD algorithms. In the framework, the balance between FAR and FRR was implicitly controlled by using a regularization parameter. In addition, the regularization was done in a reproducing kernel Hilbert space (RKHS) which made it easy to apply a nonlinear transform function via "kernel trick" for noise reduction. Under this framework, a better tradeoff between FAR and FRR was obtained in VAD. Considering the non-stationarity property of speech and noise, in this study, an adaptive regularization framework was further developed in which the regularization parameter was changed adaptively according to local variations of the signal to noise ratio (SNR). We tested our algorithm on VAD experiments, and compared it with several typical VAD algorithms. The results showed that the proposed algorithm could be used to improve the robustness of VAD.

Full Paper

Bibliographic reference.  Lu, Xugang / Unoki, Masashi / Isotani, Ryosuke / Kawai, Hisashi / Nakamura, Satoshi (2011): "Adaptive regularization framework for robust voice activity detection", In INTERSPEECH-2011, 2653-2656.