Angular Softmax for Short-Duration Text-independent Speaker Verification

Zili Huang, Shuai Wang, Kai Yu


Recently, researchers propose to build deep learning based end-to-end speaker verification (SV) systems and achieve competitive results compared with the standard i-vector approach. In addition to deep learning architectures, optimization metric, such as softmax loss or triplet loss, is important for extracting speaker embeddings which are discriminative and generalizable to unseen speakers. In this paper, angular softmax (A-softmax) loss is introduced to improve speaker embedding quality. It is investigated in two SV frameworks: a CNN based end-to-end SV framework and an i-vector SV framework where deep discriminant analysis is used for channel compensation. Experimental results on a short-duration text-independent speaker verification dataset generated from SRE reveal that A-softmax achieves significant performance improvement compared with other metrics in both frameworks.


 DOI: 10.21437/Interspeech.2018-1545

Cite as: Huang, Z., Wang, S., Yu, K. (2018) Angular Softmax for Short-Duration Text-independent Speaker Verification. Proc. Interspeech 2018, 3623-3627, DOI: 10.21437/Interspeech.2018-1545.


@inproceedings{Huang2018,
  author={Zili Huang and Shuai Wang and Kai Yu},
  title={Angular Softmax for Short-Duration Text-independent Speaker Verification},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3623--3627},
  doi={10.21437/Interspeech.2018-1545},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1545}
}