ODESSA at Albayzin Speaker Diarization Challenge 2018

Jose Patino, Héctor Delgado, Ruiqing Yin, Hervé Bredin, Claude Barras, Nicholas Evans


This paper describes the ODESSA submissions to the Albayzin Speaker Diarization Challenge 2018. The challenge addresses the diarization of TV shows. This work explores three different techniques to represent speech segments, namely binary key, x-vector and triplet-loss based embeddings. While training-free methods such as the binary key technique can be applied easily to a scenario where training data is limited, the training of robust neural-embedding extractors is considerably more challenging. However, when training data is plentiful (open-set condition), neural embeddings provide more robust segmentations, giving speaker representations which lead to better diarization performance. The paper also reports our efforts to improve speaker diarization performance through system combination. For systems with a common temporal resolution, fusion is performed at segment level during clustering. When the systems under fusion produce segmentations with an arbitrary resolution, they are combined at solution level. Both approaches to fusion are shown to improve diarization performance.


 DOI: 10.21437/IberSPEECH.2018-43

Cite as: Patino, J., Delgado, H., Yin, R., Bredin, H., Barras, C., Evans, N. (2018) ODESSA at Albayzin Speaker Diarization Challenge 2018. Proc. IberSPEECH 2018, 211-215, DOI: 10.21437/IberSPEECH.2018-43.


@inproceedings{Patino2018,
  author={Jose Patino and Héctor Delgado and Ruiqing Yin and Hervé Bredin and Claude Barras and Nicholas Evans},
  title={{ODESSA at Albayzin Speaker Diarization Challenge 2018}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={211--215},
  doi={10.21437/IberSPEECH.2018-43},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-43}
}