In this paper we investigate the use of voice activity detection (VAD) for improving noise models used for cepstral domain minimum mean squared error (MMSE) filtering of noisy speech. Due to the popularity of MFCC features for speech recognition, it is useful to have VAD methods and MMSE filtering algorithms that both work in the MFCC domain. We propose a method for VAD based on the likelihood ratio test (LRT) that works directly on MFCC feature vectors. Detected noise-only frames are collected and used for creating a noise model which is then used for MMSE filtering. Finally, speech recognition is run using models trained in clean conditions. Experiments on AURORA2 show that our approach is successful in improving the noise model compared to the common approach of simply using the first few frames of each file for noise modeling, and that the proposed VAD method has performance comparable to a well-known LRT-based VAD algorithm that works in the DFT domain.
Bibliographic reference. Pettersen, Svein Gunnar / Johnsen, Magne Hallstein (2008): "Cepstral domain voice activity detection for improved noise modeling in MMSE feature enhancement for ASR", In INTERSPEECH-2008, 1012-1015.