ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Voice activity detection based on optimally weighted combination of multiple features

Yusuke Kida, Tatsuya Kawahara

This paper presents a voice activity detection (VAD) scheme that is robust against noise, based on an optimally weighted combination of features. The scheme uses a weighted combination of four conventional VAD features: amplitude level, zero crossing rate, spectral information, and Gaussian mixture model likelihood. This combination in effect selects the optimal method depending on the noise condition. The weights for the combination are updated using minimum classification error (MCE) training. An experimental evaluation under three types of noisy environment demonstrated the noise robustness of our proposed method. Adapting the feature weights was shown to enhance the detection ability and to be possible using ten or fewer training utterances.


doi: 10.21437/Interspeech.2005-244

Cite as: Kida, Y., Kawahara, T. (2005) Voice activity detection based on optimally weighted combination of multiple features. Proc. Interspeech 2005, 2621-2624, doi: 10.21437/Interspeech.2005-244

@inproceedings{kida05_interspeech,
  author={Yusuke Kida and Tatsuya Kawahara},
  title={{Voice activity detection based on optimally weighted combination of multiple features}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={2621--2624},
  doi={10.21437/Interspeech.2005-244}
}