INTERSPEECH 2010
11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Adaptive High Accuracy Approaches to Speech Activity Detection in Noisy and Hostile Audio Environments

Mark Huggins (1), Brett Smolenski (2), Aaron Lawson (2)

(1) Oasis Systems Inc., USA
(2) RADC Inc., USA

This study examines the difficult task of Speech Activity Detection (SAD) in two hostile environments: AM push-to-talk air traffic control and international telephone conversations with very low SNRs. Due to the poor performance of traditional energy-based SAD, two novel approaches to SAD were developed that specifically target spectral characteristics that typify speech, rather than trying to separate out the background, which can vary enormously. As a result these approaches are inherently adaptive to their environments. A Speech Energy Resonance Band Detection approach and a Harmonic Product Spectrum clustering approach to SAD are described in this paper and their performance evaluated against MIT Xtalk and the Teager Energy Operator (TEO) in clean and hostile environments.

Full Paper

Bibliographic reference.  Huggins, Mark / Smolenski, Brett / Lawson, Aaron (2010): "Adaptive high accuracy approaches to speech activity detection in noisy and hostile audio environments", In INTERSPEECH-2010, 3094-3097.