9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Statistical Speech Activity Detection Based on Spatial Power Distribution for Analyses of Poster Presentations

Kentaro Ishizuka (1), Shoko Araki (1), Tatsuya Kawahara (2)

(1) NTT Corporation, Japan; (2) Kyoto University, Japan

This paper proposes a microphone array based statistical speech activity detection (SAD) method for analyses of poster presentations recorded in the presence of noise. Such poster presentations are a kind of multi-party conversation, where the number of speakers and speaker location are unrestricted, and directional noise sources affect the direction of arrival of the target speech signals. To detect speech activity in such cases without a priori knowledge about the speakers and noise environments, we applied a likelihood ratio test based SAD method to spatial power distributions. The proposed method can exploit the enhanced signals obtained from time-frequency masking, and work even in the presence of environmental noise by utilizing the a priori signal-to-noise ratios of the spatial power distributions. Experiments with recorded poster presentations confirmed that the proposed method significantly improves the SAD accuracies compared with those obtained with a frequency spectrum based statistical SAD method.

