This paper presents our approach to unsupervised multi-speaker conversational speech segmentation.
Speech segmentation is obtained in two steps that employ different techniques. The first step performs a preliminary segmentation of the conversation analyzing fixed length slices, and assumes the presence in every slice of one or two speakers. The second step clusters the segments obtained by the previous step, estimates the number of speaker, and refines the segment boundaries using more accurate models.
We evaluated our algorithms on the speaker segmentation tasks proposed by the 2000 NIST speaker recognition evaluation where the proposed approach produces state-of-the art segmentation error rates and on the 2004 NIST multi-speaker conversation tests where we compare the verification performance using automatically segmented training data with the one obtained using single speaker data.
Cite as: Dalmasso, E., Laface, P., Colibro, D., Vair, C. (2005) Unsupervised segmentation and verification of multi-speaker conversational speech. Proc. Interspeech 2005, 3053-3056, doi: 10.21437/Interspeech.2005-654
@inproceedings{dalmasso05_interspeech, author={Emanuele Dalmasso and Pietro Laface and Daniele Colibro and Claudio Vair}, title={{Unsupervised segmentation and verification of multi-speaker conversational speech}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={3053--3056}, doi={10.21437/Interspeech.2005-654} }