ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Speaker segmentation and clustering for simultaneously presented speech

Lingyun Gu, Richard M. Stern

This paper proposes a new scheme used to segment and cluster speech segments on an unsupervised basis in cases where multiple speakers are presented simultaneously at different SNRs. The new elements in our work are in the development of new feature for segmenting and clustering simultaneously-presented speech, the procedure for identifying a candidate set of possible speakerchange points, and the use of pair-wise cross-segment distance distributions to cluster segments by speaker. The proposed system is evaluated in terms of the F measure that is obtained. The system is compared to a baseline system that uses MFCC for acoustic features, the Bayesian Information Criterion (BIC) for detecting speaker-change points, and the Kullback-Leibler distance for clustering the segments. Experimental indicate that the new system consistently provides better performance than the baseline system with very small computational cost.


doi: 10.21437/Interspeech.2009-672

Cite as: Gu, L., Stern, R.M. (2009) Speaker segmentation and clustering for simultaneously presented speech. Proc. Interspeech 2009, 2551-2554, doi: 10.21437/Interspeech.2009-672

@inproceedings{gu09_interspeech,
  author={Lingyun Gu and Richard M. Stern},
  title={{Speaker segmentation and clustering for simultaneously presented speech}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2551--2554},
  doi={10.21437/Interspeech.2009-672}
}