EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

A New Adaptive Long-Term Spectral Estimation Voice Activity Detector

Javier Ramirez, Jose C. Segura, Carmen Benitez, Angel de la Torre, Antonio J. Rubio

Universidad de Granada, Spain

This paper shows an efficient voice activity detector (VAD) that is based on the estimation of the long-term spectral divergence (LTSD) between noise and speech periods. The proposed method decomposes the input signal into overlapped speech frames, uses a sliding window to compute the long-term spectral envelope and measures the speech/non-speech LTSD, thus yielding a high discriminating decision rule and minimizing the average number of decision errors. In order to increase nonspeech detection accuracy, the decision threshold is adapted to the measured noise energy while a controlled hang-over is activated only when the observed signal-to-noise ratio (SNR) is low. An exhaustive analysis of the proposed VAD is carried out using the AURORA TIdigits and SpeechDat-Car (SDC) databases. The proposed VAD is compared to the most commonly used ones in the field in terms of speech/non-speech detection and recognition performance. Experimental results demonstrate a sustained advantage over G.729, AMR and AFE VADs.

Full Paper

Bibliographic reference.  Ramirez, Javier / Segura, Jose C. / Benitez, Carmen / Torre, Angel de la / Rubio, Antonio J. (2003): "A new adaptive long-term spectral estimation voice activity detector", In EUROSPEECH-2003, 3041-3044.