14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

All for One: Feature Combination for Highly Channel-Degraded Speech Activity Detection

Martin Graciarena (1), Abeer Alwan (2), Dan Ellis (3), Horacio Franco (1), Luciana Ferrer (1), John H. L. Hansen (4), Adam Janin (5), Byung-Suk Lee (3), Yun Lei (1), Vikramjit Mitra (1), Nelson Morgan (5), Seyed Omid Sadjadi (4), T. J. Tsai (5), Nicolas Scheffer (1), Lee Ngee Tan (2), Benjamin Williams (1)

(1) SRI International, USA
(2) University of California at Los Angeles, USA
(3) Columbia University, USA
(4) University of Texas at Dallas, USA

Speech activity detection (SAD) on channel transmissions is a critical preprocessing task for speech, speaker and language recognition or for further human analysis. This paper presents a feature combination approach to improve SAD on highly channel degraded speech as part of the Defense Advanced Research Projects Agency's (DARPA) Robust Automatic Transcription of Speech (RATS) program. The key contribution is the feature combination exploration of different novel SAD features based on pitch and spectro-temporal processing and the standard Mel Frequency Cepstral Coefficients (MFCC) acoustic feature. The SAD features are: (1) a GABOR feature representation, followed by a multilayer perceptron (MLP); (2) a feature that combines multiple voicing features and spectral flux measures (Combo); (3) a feature based on subband autocorrelation (SAcC) and MLP postprocessing and (4) a multiband comb-filter F0 (MBCombF0) voicing measure. We present single, pairwise and all feature combinations, show high error reductions from pairwise feature level combination over the MFCC baseline and show that the best performance is achieved by the combination of all features.

Full Paper

Bibliographic reference.  Graciarena, Martin / Alwan, Abeer / Ellis, Dan / Franco, Horacio / Ferrer, Luciana / Hansen, John H. L. / Janin, Adam / Lee, Byung-Suk / Lei, Yun / Mitra, Vikramjit / Morgan, Nelson / Sadjadi, Seyed Omid / Tsai, T. J. / Scheffer, Nicolas / Tan, Lee Ngee / Williams, Benjamin (2013): "All for one: feature combination for highly channel-degraded speech activity detection", In INTERSPEECH-2013, 709-713.