10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Entropy Based Overlapped Speech Detection as a Pre-Processing Stage for Speaker Diarization

Oshry Ben-Harush (1), Itshak Lapidot (2), Hugo Guterman (1)

(1) Ben-Gurion University of the Negev, Israel
(2) Sami Shamoon College of Engineering, Israel

One inherent deficiency of most diarization systems is their inability to handle co-channel or overlapped speech. Most of the suggested algorithms perform under singular conditions, require high computational complexity in both time and frequency domains.

In this study, frame based entropy analysis of the audio data in the time domain serves as a single feature for an overlapped speech detection algorithm. Identification of overlapped speech segments is performed using Gaussian Mixture Modeling (GMM) along with well known classification algorithms applied on two speaker conversations. By employing this methodology, the proposed method eliminates the need for setting a hard threshold for each conversation or database.

LDC CALLHOME American English corpus is used for evaluation of the suggested algorithm. The proposed method successfully detects 63.2% of the frames labeled as overlapped speech by the manual segmentation, while keeping a 5.4% false-alarm rate.

Full Paper

Bibliographic reference.  Ben-Harush, Oshry / Lapidot, Itshak / Guterman, Hugo (2009): "Entropy based overlapped speech detection as a pre-processing stage for speaker diarization", In INTERSPEECH-2009, 916-919.