This paper presents a voice activity detection (VAD) method using the measurement of a posteriori signal-to-noise ratio (SNR) weighted energy. The motivations are manifold: 1) the difference in frame-to-frame energy provides a great discrimination for speech signals, 2) speech segments, besides their characteristics, are accounted also on their reliability e.g. measured by SNR, 3) the a posteriori SNR for noise-only segments will theoretically equal to 0 dB, being ideal for VAD, and 4) both energy and a posteriori SNR are easy to estimate, resulting in a low complexity. The method is experimentally shown to be superior to a number of referenced methods and standards.
Cite as: Tan, Z.-H., Lindberg, B. (2009) High-accuracy, low-complexity voice activity detection based on a posteriori SNR weighted energy. Proc. Interspeech 2009, 2231-2234, doi: 10.21437/Interspeech.2009-634
@inproceedings{tan09_interspeech, author={Zheng-Hua Tan and Børge Lindberg}, title={{High-accuracy, low-complexity voice activity detection based on a posteriori SNR weighted energy}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={2231--2234}, doi={10.21437/Interspeech.2009-634} }