ODYSSEY 2004 - The Speaker and Language Recognition Workshop

May 31 - June 3, 2004
Toledo, Spain

Speaker Diarisation for Broadcast News

S. E. Tranter (1), Douglas A. Reynolds (2)

(1) Cambridge University Engineering Department, UK
(2) MIT-Lincoln Laboratory, Lexington, MA, USA

It is often important to be able to automatically label ‘who spoke when’ during some audio data. This paper describes two systems for audio segmentation developed at CUED and MIT-LL and evaluates their performance using the speaker diarisation score defined in the 2003 Rich Transcription Evaluation. A new clustering procedure and BIC-based stopping criterion for the CUED system is introduced which improves both performance and robustness to changes in segmentation. Finally a hybrid ‘Plug and Play’ system is built which combines different parts of the CUED and MIT-LL systems to produce a single system which outperforms both the individual systems.

Full Paper

Bibliographic reference.  Tranter, S. E. / Reynolds, Douglas A. (2004): "Speaker diarisation for broadcast news", In ODYS-2004, 337-344.