This paper presents a voice activity detection (VAD) method using the measurement of a posteriori signal-to-noise ratio (SNR) weighted energy. The motivations are manifold: 1) the difference in frame-to-frame energy provides a great discrimination for speech signals, 2) speech segments, besides their characteristics, are accounted also on their reliability e.g. measured by SNR, 3) the a posteriori SNR for noise-only segments will theoretically equal to 0 dB, being ideal for VAD, and 4) both energy and a posteriori SNR are easy to estimate, resulting in a low complexity. The method is experimentally shown to be superior to a number of referenced methods and standards.
Bibliographic reference. Tan, Zheng-Hua / Lindberg, Børge (2009): "High-accuracy, low-complexity voice activity detection based on a posteriori SNR weighted energy", In INTERSPEECH-2009, 2231-2234.