Joint Discriminative Embedding Learning, Speech Activity and Overlap Detection for the DIHARD Speaker Diarization Challenge

Valter Akira Miasato Filho, Diego Augusto Silva, Luis Gustavo Depra Cuozzo


The DIHARD is a new, annual speaker diarization challenge focusing on “hard” domains, i.e. datasets in which current state-of-the-art systems are expected to perform poorly. We present our diarization system, which is a neural network jointly optimized for speaker embedding learning, speech activity and overlap detection. We present our network topology and the affinity matrix loss objective function responsible for learning the frame-wise speaker embeddings. The outputs of the network are then clustered with k-means and each frame classified with speech activity is assigned to one or two speakers, depending on the overlap detection. For the training data, we used two well-known meeting corpora - the AMI and the ICSI datasets, together with the provided samples from the DIHARD challenge. To further enhance our system, we present three data augmentation settings: the first is a naive concatenation of isolated speaker utterances from non-diarization datasets, which generates artificial diarization prompts. The second is a simple noise addition with sampled signal-to-noise ratios. The third is using noise suppression over the development data. All training setups are compared in terms of diarization error rate and mutual information in the evaluation set of the challenge.


 DOI: 10.21437/Interspeech.2018-2304

Cite as: Miasato Filho, V.A., Silva, D.A., Depra Cuozzo, L.G. (2018) Joint Discriminative Embedding Learning, Speech Activity and Overlap Detection for the DIHARD Speaker Diarization Challenge. Proc. Interspeech 2018, 2818-2822, DOI: 10.21437/Interspeech.2018-2304.


@inproceedings{Miasato Filho2018,
  author={Valter Akira {Miasato Filho} and Diego Augusto Silva and Luis Gustavo {Depra Cuozzo}},
  title={Joint Discriminative Embedding Learning, Speech Activity and Overlap Detection for the DIHARD Speaker Diarization Challenge},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2818--2822},
  doi={10.21437/Interspeech.2018-2304},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2304}
}