Voice Activity Detection (VAD) in acoustic environments remains a challenging task due to potentially adverse noise and reverberation conditions. The problem becomes even more difficult when the microphones used to detect speech reside far from the speaker. An unsupervised VAD scheme is presented in this paper. The system is based on processing signals captured by multiple farfield sensors in order to integrate spatial information in addition to the frequency content available at a single channel recording. To decide upon the presence or absence of speech the system employs a modified multiple observation hypothesis that tests at each sensor the probability of having an active speaker and then fuses the decisions. To minimize misdetections and enhance the performance of the hypothesis test a computationally efficient forgetting scheme is also employed. Simulations conducted in several artificial environments illustrate that significant improvements in performance can be expected from the proposed scheme when compared to systems of similar philosophy.
Bibliographic reference. Petsatodis, Theodoros / Talantzis, Fotios / Boukis, Christos / Tan, Zheng-Hua / Prasad, Ramjee (2011): "Multi-sensor voice activity detection based on multiple observation hypothesis testing", In INTERSPEECH-2011, 2633-2636.