Shortcut Connections Based Deep Speaker Embeddings for End-to-End Speaker Verification System

Soonshin Seo, Daniel Jun Rim, Minkyu Lim, Donghyun Lee, Hosung Park, Junseok Oh, Changmin Kim, Ji-Hwan Kim


The objective of speaker verification is to reject or accept whether or not the input speech is that of a enrolled speaker. Traditionally, i-vector or speaker embeddings system such as d-vector representing the speaker information has been showing high performance with similarity metrics at the backend. Recently it has been proposed an end-to-end system based on previous speaker embeddings approach without additional strategy after extraction. Among the various models, CNN based end-to-end system is showing state-of-the-art performance. CNN based model is trained to classify multiple speakers and speaker embeddings are extracted.

In this paper, we propose shortcut connections based deep speaker embeddings for end-to-end speaker verification system. We construct modified ResNet-18 model so that the activation outputs from bottleneck architecture have shortcut connections to speaker embeddings. Deep speaker embeddings are extracted by jointly training in end-to-end approach. The model was constructed without other sophisticated methods such as length normalization, or additive margin softmax loss. When we tested proposed model on the unconstrained conditions data set called VoxCeleb1, the result showed EER of 3.03% when tested with high dimensional deep speaker embeddings. This is the state-of-the-art performance of end-to-end speaker verification model on VoxCeleb1.


 DOI: 10.21437/Interspeech.2019-2195

Cite as: Seo, S., Rim, D.J., Lim, M., Lee, D., Park, H., Oh, J., Kim, C., Kim, J. (2019) Shortcut Connections Based Deep Speaker Embeddings for End-to-End Speaker Verification System. Proc. Interspeech 2019, 2928-2932, DOI: 10.21437/Interspeech.2019-2195.


@inproceedings{Seo2019,
  author={Soonshin Seo and Daniel Jun Rim and Minkyu Lim and Donghyun Lee and Hosung Park and Junseok Oh and Changmin Kim and Ji-Hwan Kim},
  title={{Shortcut Connections Based Deep Speaker Embeddings for End-to-End Speaker Verification System}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2928--2932},
  doi={10.21437/Interspeech.2019-2195},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2195}
}