Accurate speech activity detection is a challenging problem in the car environment where high background noise and high amplitude transient sounds are common. We investigate a number of features that are designed for capturing the harmonic structure of speech. We evaluate separately three important characteristics of these features: 1) discriminative power 2) robustness to greatly varying SNR and channel characteristics and 3) performance when used in conjunction with MFCC features. We propose a new features, the Windowed Autocorrelation Lag Energy (WALE) which has desirable properties.
Cite as: Kristjansson, T., Deligne, S., Olsen, P. (2005) Voicing features for robust speech detection. Proc. Interspeech 2005, 369-372, doi: 10.21437/Interspeech.2005-186
@inproceedings{kristjansson05_interspeech, author={Trausti Kristjansson and Sabine Deligne and Peder Olsen}, title={{Voicing features for robust speech detection}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={369--372}, doi={10.21437/Interspeech.2005-186} }