ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Who spoke when? - automatic segmentation and clustering for determining speaker turns

S. E. Johnson

The problem of labelling speaker turns by automaticallysegmenting and clustering a continuous audio streamis addressed. A new clustering scheme is presentedand evaluated using a clustering efficiency score whichtreats both agglomerative and divisive clustering strategies equally. Results show an efficiency of 70% can beobtained on both manually and automatically derivedsegments on the 1996 Hub4 development data.For the task of identifying potentially unknown anchorspeakers within broadcast news shows, the frame classification error rate is very important. To re ect this, aframe-based cluster efficiency is defined and the resultsshow a 90% frame-based efficiency can be achieved. Finally a frame-based comparison between the manuallyand automatically derived segment/cluster sets showsthat approximately one third of the errors are introducedduring segmentation and two-thirds during clustering.


doi: 10.21437/Eurospeech.1999-490

Cite as: Johnson, S.E. (1999) Who spoke when? - automatic segmentation and clustering for determining speaker turns. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2211-2214, doi: 10.21437/Eurospeech.1999-490

@inproceedings{johnson99_eurospeech,
  author={S. E. Johnson},
  title={{Who spoke when? - automatic segmentation and clustering for determining speaker turns}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={2211--2214},
  doi={10.21437/Eurospeech.1999-490}
}