ODYSSEY 2004 - The Speaker and Language Recognition Workshop
May 31 - June 3, 2004
It is often important to be able to automatically label ‘who spoke when’ during some audio data. This paper describes two systems for audio segmentation developed at CUED and MIT-LL and evaluates their performance using the speaker diarisation score defined in the 2003 Rich Transcription Evaluation. A new clustering procedure and BIC-based stopping criterion for the CUED system is introduced which improves both performance and robustness to changes in segmentation. Finally a hybrid ‘Plug and Play’ system is built which combines different parts of the CUED and MIT-LL systems to produce a single system which outperforms both the individual systems.
Bibliographic reference. Tranter, S. E. / Reynolds, Douglas A. (2004): "Speaker diarisation for broadcast news", In ODYS-2004, 337-344.