We describe a voice activity detection algorithm which leads to significant improvement of a narrow-bandwidth speaker verification system under harsh environments. This algorithm is based on a time-scale feature which is extracted from wavelet subbands. A statistical quantile filtering technique is proposed to estimate an adaptive noise threshold. A hang-over scheme is then applied to bridge short pauses between speech frames. This optimized voice activity detector is embedded in the front-end unit of the narrow-bandwidth speaker verification system. The proposed algorithm is evaluated by objective tests on band-pass filtered utterances from the TIMIT database which was artificially corrupted by different additive noise types. Furthermore, it is tested with band-pass filtered SPEECHDAT-AT and WSJ0 databases in terms of speaker verification rates. This algorithm shows its superiority in performance due to the robust time-scale feature and the adaptive threshold.
Cite as: Pham, T.V., Neffe, M., Kubin, G. (2007) Robust voice activity detection for narrow-bandwidth speaker verification under adverse environments. Proc. Interspeech 2007, 2037-2040, doi: 10.21437/Interspeech.2007-169
@inproceedings{pham07_interspeech, author={Tuan Van Pham and Michael Neffe and Gernot Kubin}, title={{Robust voice activity detection for narrow-bandwidth speaker verification under adverse environments}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={2037--2040}, doi={10.21437/Interspeech.2007-169} }