12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Online Speech Activity Detection in Broadcast News

Chao Gao, Guruprasad Saikumar, Saurabh Khanwalkar, Avi Herscovici, Anoop Kumar, Amit Srivastava, Premkumar Natarajan

Raytheon BBN Technologies, USA

In this paper, we investigate the important implications of realtime processing to the design of a speech activity detection (SAD) system, with a focus on the impact of the unique constraints posed by online automatic speech recognition. Our investigation is built on a real-life application of speech technology, the BBN Broadcast Monitoring System (BMS), which encapsulates a real-time automatic rich transcription system. We propose a segmentation method that is capable of variable scale speech boundary detection in an online SAD system and evaluate how different granularities of boundary detection impact the performance of speech-to-text (STT) and speaker diarization. In addition, the interactions between STT and speaker diarization are evaluated and mechanisms for trading off the performance of these two system components are studied. In our experiment, the segmentation mechanism in the proposed SAD system reduces error rates of STT and speaker diarization by 2.4% and 9.5% relatively, compared to the baseline system.

Full Paper

Bibliographic reference.  Gao, Chao / Saikumar, Guruprasad / Khanwalkar, Saurabh / Herscovici, Avi / Kumar, Anoop / Srivastava, Amit / Natarajan, Premkumar (2011): "Online speech activity detection in broadcast news", In INTERSPEECH-2011, 2637-2640.