ZCU-NTIS Speaker Diarization System for the DIHARD 2018 Challenge

Zbyněk Zajíc, Marie Kunešová, Jan Zelinka, Marek Hrúz


In this paper, we present the system developed by the team from the New Technologies for the Information Society (NTIS) research center of the University of West Bohemia, for the First DIHARD Speech Diarization Challenge. The base of our system follows the currently-standard approach of segmentation, i-vector extraction, clustering and resegmentation. Here, we describe the modifications to the system which allowed us to apply it to data from a range of different domains. The main contribution to our achievement is an ANN-based domain classifier, which categorizes each conversation into one of the ten domains present in the development set. This classification determines the specific system configuration, such as the expected number of speakers and the stopping criterion for the hierarchical clustering. At the time of writing of this abstract, our best submission achieves a DER of 26.90% and an MI of 8.34 bits on the evaluation set (gold speech/nonspeech segmentation).


 DOI: 10.21437/Interspeech.2018-1252

Cite as: Zajíc, Z., Kunešová, M., Zelinka, J., Hrúz, M. (2018) ZCU-NTIS Speaker Diarization System for the DIHARD 2018 Challenge. Proc. Interspeech 2018, 2788-2792, DOI: 10.21437/Interspeech.2018-1252.


@inproceedings{Zajíc2018,
  author={Zbyněk Zajíc and Marie Kunešová and Jan Zelinka and Marek Hrúz},
  title={ZCU-NTIS Speaker Diarization System for the DIHARD 2018 Challenge},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2788--2792},
  doi={10.21437/Interspeech.2018-1252},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1252}
}