A classification method is presented that detects the presence of speech embedded in a real acoustic background of non-speech sounds. Features used for classification are modulation components extracted by computation of the amplitude modulation spectrogram. Feature selection techniques and support vector classification are employed to identify modulation components most salient for the classification task and therefore considered as highly characteristic for speech. Results show that reliable detection of speech can be performed with less than 10 optimally selected modulation features, the most important ones are located in the modulation frequency range below 10 Hz. Detection of speech in a background of non-speech signals is performed with about 90% test-data accuracy at a signal-to-noise level of 0 dB. Compared to standard ITU G729.B voice activity detection, the proposed method results in increased true positive and reduced false positive rates induced by a real acoustic background.
Bibliographic reference. Anemüller, Jörn / Schmidt, Denny / Bach, Jörg-Hendrik (2008): "Detection of speech embedded in real acoustic background based on amplitude modulation spectrogram features", In INTERSPEECH-2008, 2582-2585.