ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Entropy based overlapped speech detection as a pre-processing stage for speaker diarization

Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman

One inherent deficiency of most diarization systems is their inability to handle co-channel or overlapped speech. Most of the suggested algorithms perform under singular conditions, require high computational complexity in both time and frequency domains.

In this study, frame based entropy analysis of the audio data in the time domain serves as a single feature for an overlapped speech detection algorithm. Identification of overlapped speech segments is performed using Gaussian Mixture Modeling (GMM) along with well known classification algorithms applied on two speaker conversations. By employing this methodology, the proposed method eliminates the need for setting a hard threshold for each conversation or database.

LDC CALLHOME American English corpus is used for evaluation of the suggested algorithm. The proposed method successfully detects 63.2% of the frames labeled as overlapped speech by the manual segmentation, while keeping a 5.4% false-alarm rate.


doi: 10.21437/Interspeech.2009-275

Cite as: Ben-Harush, O., Lapidot, I., Guterman, H. (2009) Entropy based overlapped speech detection as a pre-processing stage for speaker diarization. Proc. Interspeech 2009, 916-919, doi: 10.21437/Interspeech.2009-275

@inproceedings{benharush09_interspeech,
  author={Oshry Ben-Harush and Itshak Lapidot and Hugo Guterman},
  title={{Entropy based overlapped speech detection as a pre-processing stage for speaker diarization}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={916--919},
  doi={10.21437/Interspeech.2009-275}
}