11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Spectral Entropy-Based Voice Activity Detector for Videoconferencing Systems

Bowon Lee (1), Debargha Muhkerjee (2)

(1) Hewlett-Packard Laboratories, USA
(2) Google, USA

This paper proposes a statistical voice activity detector (VAD) suitable for videoconferencing applications, where detection of higher level speech activities, e.g., sentences instead of syllables, words, phrases, etc, is useful. The proposed method uses two distinct features for VAD, spectral entropy and signal energy, which are modeled as Gaussian and chi-square distributions respectively. Voice activities are determined by finding the joint probability of this statistical model as a soft measure. Experimental results show that the proposed method is suitable for detecting high level speech activities compared to the traditional methods used for speech coding or automatic speech recognition.

Full Paper

Bibliographic reference.  Lee, Bowon / Muhkerjee, Debargha (2010): "Spectral entropy-based voice activity detector for videoconferencing systems", In INTERSPEECH-2010, 3106-3109.