DNN-based Embeddings for Speaker Diarization in the AuDIaS-UAM System for the Albayzin 2018 IberSPEECH-RTVE Evaluation

Alicia Lozano-Diez, Beltran Labrador, Diego de Benito, Pablo Ramirez, Doroteo T. Toledano


This document describes the three systems submitted by the AuDIaS-UAM team for the Albayzin 2018 IberSPEECH-RTVE speaker diarization evaluation. Two of our systems (primary and contrastive 1 submissions) are based on embeddings which are a fixed length representation of a given audio segment obtained from a deep neural network (DNN) trained for speaker classification. The third system (contrastive 2) uses the classical i-vector as representation of the audio segments. The resulting embeddings or i-vectors are then grouped using Agglomerative Hierarchical Clustering (AHC) in order to obtain the diarization labels. The new DNN-embedding approach for speaker diarization has obtained a remarkable performance over the Albayzin development dataset, similar to the performance achieved with the well-known i-vector approach.


 DOI: 10.21437/IberSPEECH.2018-46

Cite as: Lozano-Diez, A., Labrador, B., de Benito, D., Ramirez, P., T. Toledano, D. (2018) DNN-based Embeddings for Speaker Diarization in the AuDIaS-UAM System for the Albayzin 2018 IberSPEECH-RTVE Evaluation. Proc. IberSPEECH 2018, 224-226, DOI: 10.21437/IberSPEECH.2018-46.


@inproceedings{Lozano-Diez2018,
  author={Alicia Lozano-Diez and Beltran Labrador and Diego {de Benito} and Pablo Ramirez and Doroteo {T. Toledano}},
  title={{DNN-based Embeddings for Speaker Diarization in the AuDIaS-UAM System for the Albayzin 2018 IberSPEECH-RTVE Evaluation}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={224--226},
  doi={10.21437/IberSPEECH.2018-46},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-46}
}