ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

All for one: feature combination for highly channel-degraded speech activity detection

Martin Graciarena, Abeer Alwan, Dan Ellis, Horacio Franco, Luciana Ferrer, John H. L. Hansen, Adam Janin, Byung-Suk Lee, Yun Lei, Vikramjit Mitra, Nelson Morgan, Seyed Omid Sadjadi, T. J. Tsai, Nicolas Scheffer, Lee Ngee Tan, Benjamin Williams

Speech activity detection (SAD) on channel transmissions is a critical preprocessing task for speech, speaker and language recognition or for further human analysis. This paper presents a feature combination approach to improve SAD on highly channel degraded speech as part of the Defense Advanced Research Projects Agency's (DARPA) Robust Automatic Transcription of Speech (RATS) program. The key contribution is the feature combination exploration of different novel SAD features based on pitch and spectro-temporal processing and the standard Mel Frequency Cepstral Coefficients (MFCC) acoustic feature. The SAD features are: (1) a GABOR feature representation, followed by a multilayer perceptron (MLP); (2) a feature that combines multiple voicing features and spectral flux measures (Combo); (3) a feature based on subband autocorrelation (SAcC) and MLP postprocessing and (4) a multiband comb-filter F0 (MBCombF0) voicing measure. We present single, pairwise and all feature combinations, show high error reductions from pairwise feature level combination over the MFCC baseline and show that the best performance is achieved by the combination of all features.


doi: 10.21437/Interspeech.2013-199

Cite as: Graciarena, M., Alwan, A., Ellis, D., Franco, H., Ferrer, L., Hansen, J.H.L., Janin, A., Lee, B.-S., Lei, Y., Mitra, V., Morgan, N., Sadjadi, S.O., Tsai, T.J., Scheffer, N., Tan, L.N., Williams, B. (2013) All for one: feature combination for highly channel-degraded speech activity detection. Proc. Interspeech 2013, 709-713, doi: 10.21437/Interspeech.2013-199

@inproceedings{graciarena13_interspeech,
  author={Martin Graciarena and Abeer Alwan and Dan Ellis and Horacio Franco and Luciana Ferrer and John H. L. Hansen and Adam Janin and Byung-Suk Lee and Yun Lei and Vikramjit Mitra and Nelson Morgan and Seyed Omid Sadjadi and T. J. Tsai and Nicolas Scheffer and Lee Ngee Tan and Benjamin Williams},
  title={{All for one: feature combination for highly channel-degraded speech activity detection}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={709--713},
  doi={10.21437/Interspeech.2013-199}
}