9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Cepstral Domain Voice Activity Detection for Improved Noise Modeling in MMSE Feature Enhancement for ASR

Svein Gunnar Pettersen, Magne Hallstein Johnsen

NTNU, Norway

In this paper we investigate the use of voice activity detection (VAD) for improving noise models used for cepstral domain minimum mean squared error (MMSE) filtering of noisy speech. Due to the popularity of MFCC features for speech recognition, it is useful to have VAD methods and MMSE filtering algorithms that both work in the MFCC domain. We propose a method for VAD based on the likelihood ratio test (LRT) that works directly on MFCC feature vectors. Detected noise-only frames are collected and used for creating a noise model which is then used for MMSE filtering. Finally, speech recognition is run using models trained in clean conditions. Experiments on AURORA2 show that our approach is successful in improving the noise model compared to the common approach of simply using the first few frames of each file for noise modeling, and that the proposed VAD method has performance comparable to a well-known LRT-based VAD algorithm that works in the DFT domain.

Full Paper

Bibliographic reference.  Pettersen, Svein Gunnar / Johnsen, Magne Hallstein (2008): "Cepstral domain voice activity detection for improved noise modeling in MMSE feature enhancement for ASR", In INTERSPEECH-2008, 1012-1015.