INTERSPEECH 2006 - ICSLP
We propose a model-based VAD derived from the Vector Taylor Series (VTS) approach. A Gaussian mixture (trained with clean speech) is used in order to provide an appropriate decision rule for speech/nonspeech detection. Additionally, VTS approach adapts the Gaussian mixture to noise conditions, yielding a stable performance for a wide range of SNRs. We have evaluated its ability for speech/non-speech detection and also its application for robust speech recognition. When compared to other VAD methods, the proposed VAD shows the best trade-off in speech/non-speech detection. When applied for Wiener Filtering and for frame dropping, the proposed VAD also provides the best recognition results.
Bibliographic reference. Torre, Ángel de la / Ramírez, Javier / Benítez, Carmen / Segura, José C. / García, L. / Rubio, Antonio J. (2006): "Noise robust model-based voice activity detection", In INTERSPEECH-2006, paper 1476-Wed3A1O.1.