11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

A New VAD Framework Using Statistical Model and Human Knowledge Based Empirical Rule

Ji Wu, Xiao-lei Zhang, Wei Li

Tsinghua University, China

This paper presents a new voice activity detection (VAD) framework that is based on the empirical rules and statistical models. First, the VAD framework detects the candidate endpoints efficiently in the time domain with empirical rules which are based on the human knowledge and the nature of the speech continuousness, and then it confirms the candidate endpoints in the transform domain with different confirmation schemes for beginning-point and ending-point. Particularly in the transform domain, a new algorithm called sliding-window double-layer confirmation (SWDC) is proposed and employed to confirm the endpoint accurately, and sensitive data, which is used for GMM training, are proposed for our detection scheme. The experiments show that the proposed VAD framework achieves better performances in various environmental conditions.

Full Paper

Bibliographic reference.  Wu, Ji / Zhang, Xiao-lei / Li, Wei (2010): "A new VAD framework using statistical model and human knowledge based empirical rule", In INTERSPEECH-2010, 3090-3093.