In this paper, we propose two robust voice activity detection (VAD) methods for adverse environments. A single subband power distance (SPD) feature is estimated from different wavelet subbands and further improved to be robust against noise. The first method is based on a neural network that operates on an input vector which consists of the SPD feature and its first and second derivatives. The second method is an adaptive threshold-based algorithm that employs only the single SPD feature. A statistical percentile filter based on long-term information is enhanced to estimate the noise threshold more adaptively. A performance evaluation and comparison is carried out for the proposed and state-of-the-art VAD algorithms on the TIMIT database which was artificially distorted by different additive noise types. The results show that the invented VAD methods are very robust to environmental noise and mostly outperform the standard VADs such as the ETSI AFE ES 202 050 and ITU-T G.729 B.
Bibliographic reference. Pham, Tuan Van / Stadtschnitzer, Michael / Pernkopf, Franz / Kubin, Gernot (2008): "Voice activity detection algorithms using subband power distance feature for noisy environments", In INTERSPEECH-2008, 2586-2589.