The speaker diarization task consists in inferring "who spoke when" in an audio stream without any prior knowledge and has been object of several NIST international evaluation campaigns is last years. A common trend for improving performances has been the use of several different feature streams as diverse as speaker location features, visual features or noise robust acoustic features. This paper describes an open source toolkit released under GPL license aiming at facilitating research in multistream speaker diarization and reproducing state-of-the-art results. In contrary to other related diarization toolkits, it is explicitly designed to handle an arbitrary number of features with very different statistics while limiting the computational complexity. The release includes a set of recipes scripts to replicate benchmark results on previous NIST evaluations and is intended to provide an easy to use software to study and include novel features into diarization systems.
Index Terms: Open Source toolkit, Speaker Diarization, multistream features, NIST Rich Transcription
Bibliographic reference. Vijayasenan, Deepu / Valente, Fabio (2012): "Diartk: an open source toolkit for research in multistream speaker diarization and its application to meetings recordings", In INTERSPEECH-2012, 2170-2173.