The Intelligent Voice System for the IberSPEECH-RTVE 2018 Speaker Diarization Challenge

Abbas Khosravani, Cornelius Glackin, Nazim Dugan, Gérard Chollet, Nigel Cannings


This paper describes the Intelligent Voice (IV) speaker diarization system for IberSPEECH-RTVE 2018 speaker diarization challenge. We developed a new speaker diarization built on the success of deep neural network based speaker embeddings in speaker verification systems. In contrary to acoustic features such as MFCCs, deep neural network embeddings are much better at discerning speaker identities especially for speech acquired without constraint on recording equipment and environment. We perform spectral clustering on our proposed CNN-LSTM-based speaker embeddings to find homogeneous segments and generate speaker log likelihood for each frame. A HMM is then used to refine the speaker posterior probabilities through limiting the probability of switching between speakers when changing frames. The proposed system is evaluated on the development set (dev2) provided by the challenge.


 DOI: 10.21437/IberSPEECH.2018-48

Cite as: Khosravani, A., Glackin, C., Dugan, N., Chollet, G., Cannings, N. (2018) The Intelligent Voice System for the IberSPEECH-RTVE 2018 Speaker Diarization Challenge. Proc. IberSPEECH 2018, 231-235, DOI: 10.21437/IberSPEECH.2018-48.


@inproceedings{Khosravani2018,
  author={Abbas Khosravani and Cornelius Glackin and Nazim Dugan and Gérard Chollet and Nigel Cannings},
  title={{The Intelligent Voice System for the IberSPEECH-RTVE 2018 Speaker Diarization Challenge}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={231--235},
  doi={10.21437/IberSPEECH.2018-48},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-48}
}