The task of robustly detecting distant speech in low SNR environments for automatic speech recognition is examined using a two-stage approach based on two distinguishing features of speech, namely harmonicity and modulation frequency (MF). A modified metric for harmonicity is used as a gating function to a set of parallel classifiers that incorporate MFs computed on different frequency bands. Performance is evaluated on both the frame-level discriminative power and also the system level ASR results on a real-world robotic forklift task. Compared to other previously proposed features such as relative spectral entropy, and classification strategies involving MFs, the combined approach shows good generalization across different kinds of dynamic noise conditions, and obtains a significant improvement on the false alarm rate at low speech miss rate settings. The overall ASR results also improved significantly compared to the ESTI AMR-VAD2, while reducing the number of false alarms by a factor of two.
Bibliographic reference. Chuangsuwanich, Ekapol / Glass, James (2011): "Robust voice activity detector for real world applications using harmonicity and modulation frequency", In INTERSPEECH-2011, 2645-2648.