10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Speaker Segmentation and Clustering for Simultaneously Presented Speech

Lingyun Gu, Richard M. Stern

Carnegie Mellon University, USA

This paper proposes a new scheme used to segment and cluster speech segments on an unsupervised basis in cases where multiple speakers are presented simultaneously at different SNRs. The new elements in our work are in the development of new feature for segmenting and clustering simultaneously-presented speech, the procedure for identifying a candidate set of possible speakerchange points, and the use of pair-wise cross-segment distance distributions to cluster segments by speaker. The proposed system is evaluated in terms of the F measure that is obtained. The system is compared to a baseline system that uses MFCC for acoustic features, the Bayesian Information Criterion (BIC) for detecting speaker-change points, and the Kullback-Leibler distance for clustering the segments. Experimental indicate that the new system consistently provides better performance than the baseline system with very small computational cost.

Full Paper

Bibliographic reference.  Gu, Lingyun / Stern, Richard M. (2009): "Speaker segmentation and clustering for simultaneously presented speech", In INTERSPEECH-2009, 2551-2554.