ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge

Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel, Eduardo Lleida


This paper presents the latest improvements in Speaker Diarization obtained by ViVoLAB research group for the 2019 DIHARD Diarization Challenge. This evaluation seeks the improvement of the diarization task in adverse conditions. For this purpose, the audio recordings involve multiple scenarios with no restrictions in terms of speakers, overlapped speech nor quality of the audio. Our submission follows the traditional segmentation-clustering-resegmentation pipeline: Speaker embeddings are extracted from acoustic segments with a single speaker on them, later clustered by means of a PLDA. Our contribution in this work is focused on the clustering step. We present results with our Variational Bayes PLDA clustering and our tree-based clustering strategy, which sequentially assigns the different embeddings to its corresponding speaker according to a PLDA model. Both strategies compare multiple diarization hypotheses and choose their candidate one according to a generative criterion. We also analyze the impact of the different available embeddings in the state-of-the-art with both clustering approaches.


 DOI: 10.21437/Interspeech.2019-2462

Cite as: Viñals, I., Gimeno, P., Ortega, A., Miguel, A., Lleida, E. (2019) ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge. Proc. Interspeech 2019, 988-992, DOI: 10.21437/Interspeech.2019-2462.


@inproceedings{Viñals2019,
  author={Ignacio Viñals and Pablo Gimeno and Alfonso Ortega and Antonio Miguel and Eduardo Lleida},
  title={{ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={988--992},
  doi={10.21437/Interspeech.2019-2462},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2462}
}