Voice activity detection (VAD) is mainly used to detect speech/nonspeech periods in observed noisy signals. The detected periods are used to reduce noise components or enhance speech components in noisy speech. However, current VAD techniques have serious problems in that the accuracy of detection of speech/non-speech periods drastically reduces if they are used for noisy speech and/or for mixtures of non-speech such as those in musical and environmental sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes concurrent processing of VAD and noise reduction (NR) using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to simultaneously resolve these problems. The proposed method effectively works on reducing stationary background noise by using EMD without estimating SNR (noise conditions), and then on reducing non-stationary noise including non-speech components by using MSA while this is determining speech/non-speech periods by thresholding the noise-reduced speech. Three experiments on VAD/NR in real environments were conducted to evaluate the proposed method by comparing it with typical methods (Otsu's method, G.729B, and AMR) and our previous methods. The results demonstrated that the proposed method could accurately detect speech/non-speech periods and effectively reduce noise components simultaneously.
Bibliographic reference. Kanai, Yasuaki / Morita, Shota / Unoki, Masashi (2013): "Concurrent processing of voice activity detection and noise reduction using empirical mode decomposition and modulation spectrum analysis", In INTERSPEECH-2013, 742-746.