ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

High-accuracy, low-complexity voice activity detection based on a posteriori SNR weighted energy

Zheng-Hua Tan, Børge Lindberg

This paper presents a voice activity detection (VAD) method using the measurement of a posteriori signal-to-noise ratio (SNR) weighted energy. The motivations are manifold: 1) the difference in frame-to-frame energy provides a great discrimination for speech signals, 2) speech segments, besides their characteristics, are accounted also on their reliability e.g. measured by SNR, 3) the a posteriori SNR for noise-only segments will theoretically equal to 0 dB, being ideal for VAD, and 4) both energy and a posteriori SNR are easy to estimate, resulting in a low complexity. The method is experimentally shown to be superior to a number of referenced methods and standards.


doi: 10.21437/Interspeech.2009-634

Cite as: Tan, Z.-H., Lindberg, B. (2009) High-accuracy, low-complexity voice activity detection based on a posteriori SNR weighted energy. Proc. Interspeech 2009, 2231-2234, doi: 10.21437/Interspeech.2009-634

@inproceedings{tan09_interspeech,
  author={Zheng-Hua Tan and Børge Lindberg},
  title={{High-accuracy, low-complexity voice activity detection based on a posteriori SNR weighted energy}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2231--2234},
  doi={10.21437/Interspeech.2009-634}
}