Robust and Discriminative Speaker Embedding via Intra-Class Distance Variance Regularization

Nam Le, Jean-Marc Odobez


Learning a good speaker embedding is critical for many speech processing tasks, including recognition, verification and diarization. To this end, we propose a complementary optimizing goal called intra-class loss to improve deep speaker embeddings learned with triplet loss. This loss function is formulated as a soft constraint on the averaged pair-wise distance between samples from the same class. Its goal is to prevent the scattering of these samples within the embedding space to increase the intra-class compactness.When intra-class loss is jointly optimized with triplet loss, we can observe 2 major improvements: the deep embedding network can achieve a more robust and discriminative representation and the training process is more stable with a faster convergence rate. We conduct experiments on 2 large public benchmarking datasets for speaker verification, VoxCeleb and VoxForge. The results show that intra-class loss helps accelerating the convergence of deep network training and significantly improves the overall performance of the resulted embeddings.


 DOI: 10.21437/Interspeech.2018-1685

Cite as: Le, N., Odobez, J. (2018) Robust and Discriminative Speaker Embedding via Intra-Class Distance Variance Regularization. Proc. Interspeech 2018, 2257-2261, DOI: 10.21437/Interspeech.2018-1685.


@inproceedings{Le2018,
  author={Nam Le and Jean-Marc Odobez},
  title={Robust and Discriminative Speaker Embedding via Intra-Class Distance Variance Regularization},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2257--2261},
  doi={10.21437/Interspeech.2018-1685},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1685}
}