A Triplet Ranking-Based Neural Network for Speaker Diarization and Linking

Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier


This paper investigates a novel neural scoring method, based on conventional i-vectors, to perform speaker diarization and linking of large collections of recordings. Using triplet loss for training, the network projects i-vectors in a space that better separates speakers in terms of cosine similarity. Experiments are run on two French TV collections built from REPERE [1] and ETAPE [2] campaigns corpora, the system being trained on French Radio data. Results indicate that the proposed approach outperforms conventional cosine and Probabilistic Linear Discriminant Analysis scoring methods on both within- and cross-recording diarization tasks, with a Diarization Error Rate reduction of 14% in average.


 DOI: 10.21437/Interspeech.2017-270

Cite as: Lan, G.L., Charlet, D., Larcher, A., Meignier, S. (2017) A Triplet Ranking-Based Neural Network for Speaker Diarization and Linking. Proc. Interspeech 2017, 3572-3576, DOI: 10.21437/Interspeech.2017-270.


@inproceedings{Lan2017,
  author={Gaël Le Lan and Delphine Charlet and Anthony Larcher and Sylvain Meignier},
  title={A Triplet Ranking-Based Neural Network for Speaker Diarization and Linking},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3572--3576},
  doi={10.21437/Interspeech.2017-270},
  url={http://dx.doi.org/10.21437/Interspeech.2017-270}
}