EUROSPEECH 2003 - INTERSPEECH 2003
This paper shows an efficient voice activity detector (VAD) that is based on the estimation of the long-term spectral divergence (LTSD) between noise and speech periods. The proposed method decomposes the input signal into overlapped speech frames, uses a sliding window to compute the long-term spectral envelope and measures the speech/non-speech LTSD, thus yielding a high discriminating decision rule and minimizing the average number of decision errors. In order to increase nonspeech detection accuracy, the decision threshold is adapted to the measured noise energy while a controlled hang-over is activated only when the observed signal-to-noise ratio (SNR) is low. An exhaustive analysis of the proposed VAD is carried out using the AURORA TIdigits and SpeechDat-Car (SDC) databases. The proposed VAD is compared to the most commonly used ones in the field in terms of speech/non-speech detection and recognition performance. Experimental results demonstrate a sustained advantage over G.729, AMR and AFE VADs.
Bibliographic reference. Ramirez, Javier / Segura, Jose C. / Benitez, Carmen / Torre, Angel de la / Rubio, Antonio J. (2003): "A new adaptive long-term spectral estimation voice activity detector", In EUROSPEECH-2003, 3041-3044.