This paper proposes a statistical voice activity detector (VAD) suitable for videoconferencing applications, where detection of higher level speech activities, e.g., sentences instead of syllables, words, phrases, etc, is useful. The proposed method uses two distinct features for VAD, spectral entropy and signal energy, which are modeled as Gaussian and chi-square distributions respectively. Voice activities are determined by finding the joint probability of this statistical model as a soft measure. Experimental results show that the proposed method is suitable for detecting high level speech activities compared to the traditional methods used for speech coding or automatic speech recognition.
Bibliographic reference. Lee, Bowon / Muhkerjee, Debargha (2010): "Spectral entropy-based voice activity detector for videoconferencing systems", In INTERSPEECH-2010, 3106-3109.