ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

BINSEG: an efficient speaker-based segmentation technique

Jindrich Zdansky

In this paper we present a new efficient approach to speaker-based audio stream segmentation. It employs binary segmentation technique that is well-known from mathematical statistic. Because integral part of this technique is hypotheses testing, we compare two well-founded (Maximum Likelihood, Informational) and one commonly used (BIC difference) approach for deriving speaker-change test statistics. Based on results of this comparison we propose both off-line and on-line speaker change detection algorithms (including way of effective training) that have merits of high accuracy and low computational costs. In simulated tests with artificially mixed data the on-line algorithm identified 95.7% of all speaker changes with precision of 96.9%. In tests done with 30 hours of real broadcast news (in 9 languages) the average recall was 74.4% and precision 70.3%.

doi: 10.21437/Interspeech.2006-567

Cite as: Zdansky, J. (2006) BINSEG: an efficient speaker-based segmentation technique. Proc. Interspeech 2006, paper 1459-Thu1A1O.2, doi: 10.21437/Interspeech.2006-567

  author={Jindrich Zdansky},
  title={{BINSEG: an efficient speaker-based segmentation technique}},
  booktitle={Proc. Interspeech 2006},
  pages={paper 1459-Thu1A1O.2},