14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

The IBM Speech Activity Detection System for the DARPA RATS Program

George Saon, Samuel Thomas, Hagen Soltau, Sriram Ganapathy, Brian Kingsbury

IBM T.J. Watson Research Center, USA

We present the IBM speech activity detection system that was fielded in the phase 2 evaluation of the DARPA RATS (robust automatic transcription of speech) program. Key ingredients of the system are: multi-pass HMM Viterbi segmentation, fusion of multiple feature streams, file-based and speech-based normalization schemes, the use of regular and convolutional deep neural networks, and model fusion through frame-level score combination of channel-dependent models. These techniques were instrumental in achieving a 1.4% equal error rate on the RATS phase 2 evaluation data.

Full Paper

Bibliographic reference.  Saon, George / Thomas, Samuel / Soltau, Hagen / Ganapathy, Sriram / Kingsbury, Brian (2013): "The IBM speech activity detection system for the DARPA RATS program", In INTERSPEECH-2013, 3497-3501.