This paper proposes a novel hands-free voice activity detection (VAD) method utilizing not only temporal features but also spatial features, called adaptive zero crossing detection (AZCD), that uses talker direction estimation. It firstly estimates talker direction to extract two spatial features: spatial reliability and spatial variance, based on weighted cross-power spectrum phase analysis and maximum likelihood estimation. Then, the AZCD detects voice activity frames by robustly detecting zero crossing information of speech with adaptively controlled thresholds using the extracted spatial features in noisy environments. The experimental results in an actual office room confirmed that the VAD performance of the proposed method that utilizes both temporal and spatial features is superior to that of the conventional method that utilizes only the temporal or spatial features.
Bibliographic reference. Denda, Yuki / Tanaka, Takamasa / Nakayama, Masato / Nishiura, Takanobu / Yamashita, Yoichi (2007): "Noise-robust hands-free voice activity detection with adaptive zero crossing detection using talker direction estimation", In INTERSPEECH-2007, 222-225.