ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II

Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Anastasia Avdeeva, Artem Gorlanov, Alexandr Kozlov

This paper describes the ITMO University (DI-IT team) speaker diarization systems submitted to DIHARD Challenge II. As with DIHARD I, this challenge is focused on diarization task for microphone recordings in varying difficult conditions. According to the results of the previous DIHARD I Challenge state-of-the-art diarization systems are based on x-vector embeddings. Such embeddings are clustered using agglomerative hierarchical clustering (AHC) algorithm by means of PLDA scoring. Current research continues the investigation of deep speaker embedding efficiency for the speaker diarization task. This paper explores new types of embedding extractors with different deep neural network architectures and training strategies. We also used AHC to perform embeddings clustering. Alternatively to the PLDA scoring in our AHC procedure we used discriminatively trained cosine similarity metric learning (CSML) model for scoring. Moreover we focused on the optimal AHC threshold tuning according to the specific speech quality. Environment classifier was preliminary trained on development set to predict acoustic conditions for this purpose. We show that such threshold adaptation scheme allows to reduce diarization error rate compared to common AHC threshold for all conditions.

doi: 10.21437/Interspeech.2019-2757

Cite as: Novoselov, S., Gusev, A., Ivanov, A., Pekhovsky, T., Shulipa, A., Avdeeva, A., Gorlanov, A., Kozlov, A. (2019) Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II. Proc. Interspeech 2019, 1003-1007, doi: 10.21437/Interspeech.2019-2757

  author={Sergey Novoselov and Aleksei Gusev and Artem Ivanov and Timur Pekhovsky and Andrey Shulipa and Anastasia Avdeeva and Artem Gorlanov and Alexandr Kozlov},
  title={{Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II}},
  booktitle={Proc. Interspeech 2019},