Interspeech'2005 - Eurospeech
This paper presents a voice activity detection (VAD) scheme that is robust against noise, based on an optimally weighted combination of features. The scheme uses a weighted combination of four conventional VAD features: amplitude level, zero crossing rate, spectral information, and Gaussian mixture model likelihood. This combination in effect selects the optimal method depending on the noise condition. The weights for the combination are updated using minimum classification error (MCE) training. An experimental evaluation under three types of noisy environment demonstrated the noise robustness of our proposed method. Adapting the feature weights was shown to enhance the detection ability and to be possible using ten or fewer training utterances.
Bibliographic reference. Kida, Yusuke / Kawahara, Tatsuya (2005): "Voice activity detection based on optimally weighted combination of multiple features", In INTERSPEECH-2005, 2621-2624.