INTERSPEECH 2007
8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Noise-Robust Hands-Free Voice Activity Detection with Adaptive Zero Crossing Detection Using Talker Direction Estimation

Yuki Denda, Takamasa Tanaka, Masato Nakayama, Takanobu Nishiura, Yoichi Yamashita

Ritsumeikan University, Japan

This paper proposes a novel hands-free voice activity detection (VAD) method utilizing not only temporal features but also spatial features, called adaptive zero crossing detection (AZCD), that uses talker direction estimation. It firstly estimates talker direction to extract two spatial features: spatial reliability and spatial variance, based on weighted cross-power spectrum phase analysis and maximum likelihood estimation. Then, the AZCD detects voice activity frames by robustly detecting zero crossing information of speech with adaptively controlled thresholds using the extracted spatial features in noisy environments. The experimental results in an actual office room confirmed that the VAD performance of the proposed method that utilizes both temporal and spatial features is superior to that of the conventional method that utilizes only the temporal or spatial features.

Full Paper

Bibliographic reference.  Denda, Yuki / Tanaka, Takamasa / Nakayama, Masato / Nishiura, Takanobu / Yamashita, Yoichi (2007): "Noise-robust hands-free voice activity detection with adaptive zero crossing detection using talker direction estimation", In INTERSPEECH-2007, 222-225.