The GTM-UVIGO System for Audiovisual Diarization

Eduardo Ramos-Muguerza, Laura Docío-Fernández, José Luis Alba-Castro


This paper explains in detail the Audiovisual system deployed by the Multimedia Technologies Group (GTM) of the atlanTTic research center at the University of Vigo, for the Albayzin Multimodal Diarization Challenge (MDC) organized in the Iberspeech 2018 conference. This system is characterized by the use of state of the art face and speaker verification embeddings trained with publicly available Deep Neural Networks. Video and audio tracks are processed separately to obtain a matrix of confidence values of each time segment that are finally fused to make joint decisions on the speaker diarization result.


 DOI: 10.21437/IberSPEECH.2018-41

Cite as: Ramos-Muguerza, E., Docío-Fernández, L., Alba-Castro, J.L. (2018) The GTM-UVIGO System for Audiovisual Diarization. Proc. IberSPEECH 2018, 204-207, DOI: 10.21437/IberSPEECH.2018-41.


@inproceedings{Ramos-Muguerza2018,
  author={Eduardo Ramos-Muguerza and Laura Docío-Fernández and José Luis Alba-Castro},
  title={{The GTM-UVIGO System for Audiovisual Diarization}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={204--207},
  doi={10.21437/IberSPEECH.2018-41},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-41}
}