8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Robust Voice Activity Detection Based on Adaptive Sub-Band Energy Sequence Analysis and Harmonic Detection

Yanmeng Guo (1), Qian (2), Yonghong Yan (1)

(1) Chinese Academy of Sciences, China
(2) 2) Fu (Chinese Academy of Sciences, China

Voice activity detection (VAD) in real-world noise is a very challenging task. In this paper, a two-step methodology is proposed to solve the problem. First, segments with non-stationary components, including speech and dynamic noise, are located using sub-band energy sequence analysis (SESA). Secondly, voice is detected within the selected segments employing the proposed method concerning its harmonic structure. Therefore, speech segments can be accurately detected by this rule-based framework. This algorithm is evaluated in several databases in terms of speech/non-speech discrimination and in terms of word accuracy rate when it is used as the front-end of automatic speech recognition (ASR) system. It provides a more reliable performance over the commonly used standard methods.

Full Paper

Bibliographic reference.  Guo, Yanmeng / Qian, Qian / Yan, Yonghong (2007): "Robust voice activity detection based on adaptive sub-band energy sequence analysis and harmonic detection", In INTERSPEECH-2007, 2949-2952.