Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge.

Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel, Eduardo Lleida


This paper focuses on the estimation of the number of speakers for diarization in the context of the DIHARD Challenge at InterSpeech 2018. This evaluation seeks the improvement of the diarization task in challenging corpora (YouTube videos, meetings, court audios, etc), containing an undetermined number of speakers with different relevance in terms of speech contributions. Our proposal for the challenge is a system based on the i-vector PLDA paradigm: Given some initial segmentation of the input audio we extract i-vector representations for each acoustic fragment. These i-vectors are clustered with a Fully Bayesian PLDA. This model, a generative model with latent variables as speaker labels, produces the diarization labels by means of Variational Bayes iterations. The number of speakers is decided by comparing multiple hypotheses according to different information criteria. These criteria are developed around the Evidence Lower Bound (ELBO) provided by our PLDA.


 DOI: 10.21437/Interspeech.2018-1841

Cite as: Viñals, I., Gimeno, P., Ortega, A., Miguel, A., Lleida, E. (2018) Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge.. Proc. Interspeech 2018, 2803-2807, DOI: 10.21437/Interspeech.2018-1841.


@inproceedings{Viñals2018,
  author={Ignacio Viñals and Pablo Gimeno and Alfonso Ortega and Antonio Miguel and Eduardo Lleida},
  title={Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge.},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2803--2807},
  doi={10.21437/Interspeech.2018-1841},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1841}
}