10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

High-Accuracy, Low-Complexity Voice Activity Detection Based on a posteriori SNR Weighted Energy

Zheng-Hua Tan, Børge Lindberg

Aalborg University, Denmark

This paper presents a voice activity detection (VAD) method using the measurement of a posteriori signal-to-noise ratio (SNR) weighted energy. The motivations are manifold: 1) the difference in frame-to-frame energy provides a great discrimination for speech signals, 2) speech segments, besides their characteristics, are accounted also on their reliability e.g. measured by SNR, 3) the a posteriori SNR for noise-only segments will theoretically equal to 0 dB, being ideal for VAD, and 4) both energy and a posteriori SNR are easy to estimate, resulting in a low complexity. The method is experimentally shown to be superior to a number of referenced methods and standards.

Full Paper

Bibliographic reference.  Tan, Zheng-Hua / Lindberg, Børge (2009): "High-accuracy, low-complexity voice activity detection based on a posteriori SNR weighted energy", In INTERSPEECH-2009, 2231-2234.