ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Concurrent processing of voice activity detection and noise reduction using empirical mode decomposition and modulation spectrum analysis

Yasuaki Kanai, Shota Morita, Masashi Unoki

Voice activity detection (VAD) is mainly used to detect speech/nonspeech periods in observed noisy signals. The detected periods are used to reduce noise components or enhance speech components in noisy speech. However, current VAD techniques have serious problems in that the accuracy of detection of speech/non-speech periods drastically reduces if they are used for noisy speech and/or for mixtures of non-speech such as those in musical and environmental sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes concurrent processing of VAD and noise reduction (NR) using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to simultaneously resolve these problems. The proposed method effectively works on reducing stationary background noise by using EMD without estimating SNR (noise conditions), and then on reducing non-stationary noise including non-speech components by using MSA while this is determining speech/non-speech periods by thresholding the noise-reduced speech. Three experiments on VAD/NR in real environments were conducted to evaluate the proposed method by comparing it with typical methods (Otsu's method, G.729B, and AMR) and our previous methods. The results demonstrated that the proposed method could accurately detect speech/non-speech periods and effectively reduce noise components simultaneously.


doi: 10.21437/Interspeech.2013-206

Cite as: Kanai, Y., Morita, S., Unoki, M. (2013) Concurrent processing of voice activity detection and noise reduction using empirical mode decomposition and modulation spectrum analysis. Proc. Interspeech 2013, 742-746, doi: 10.21437/Interspeech.2013-206

@inproceedings{kanai13_interspeech,
  author={Yasuaki Kanai and Shota Morita and Masashi Unoki},
  title={{Concurrent processing of voice activity detection and noise reduction using empirical mode decomposition and modulation spectrum analysis}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={742--746},
  doi={10.21437/Interspeech.2013-206}
}