Robust Voice Activity Detection Using Frequency Domain Long-Term Differential Entropy

Debayan Ghosh, Muralishankar R, Sanjeev Gurugopinath


We propose a novel voice activity detection (VAD) scheme employing differential entropy at each frequency bin of power spectral estimates of past and present overlapping speech frames. Here, the power spectral estimate is obtained by employing the Bartlett-Welch method. Later, we add entropies across frequency bins and denote this as the frequency domain long-term differential entropy (FLDE). Long-term averaging enhances VAD performance under low signal-to-noise-ratio (SNR). We evaluate the performance of proposed FLDE scheme, considering 12 types of noises and 5 different SNRs which are artificially added to speech samples from the SWITCHBOARD corpus. We present VAD performance of FLDE and compare with existing VAD algorithms, such as ITU-T G.729B, likelihood ratio test, long-term signal variability and long-term spectral flatness measure based algorithms. Finally, we demonstrate that our FLDE-based VAD performs with best average accuracy and speech hit-rate among the VAD algorithms considered for evaluation.


 DOI: 10.21437/Interspeech.2018-1431

Cite as: Ghosh, D., R, M., Gurugopinath, S. (2018) Robust Voice Activity Detection Using Frequency Domain Long-Term Differential Entropy. Proc. Interspeech 2018, 1220-1224, DOI: 10.21437/Interspeech.2018-1431.


@inproceedings{Ghosh2018,
  author={Debayan Ghosh and Muralishankar R and Sanjeev Gurugopinath},
  title={Robust Voice Activity Detection Using Frequency Domain Long-Term Differential Entropy},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1220--1224},
  doi={10.21437/Interspeech.2018-1431},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1431}
}