Phonetically-Aware Embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention Models for the 2018 NIST Speaker Recognition Evaluation

Ignacio Viñals, Dayana Ribas, Victoria Mingote, Jorge Llombart, Pablo Gimeno, Antonio Miguel, Alfonso Ortega, Eduardo Lleida


Very often, speaker recognition systems do not take into account phonetic information explicitly. In order to gain insight along this line of research, we have studied the use of phonetic information in the embedding extraction process for automatic speaker verification systems in two different ways: on the one hand using the well-known i-vector paradigm and, on the other hand, using Wide Residual Networks (WRN) with Time Delay Neural Networks (TDNN) and Self-Attention Mechanisms. The phonetic information is provided by a WRN with TDNN using 1D convolutional layers specifically trained for this purpose. These two approaches along with the widely used x-vector system based on the Kaldi toolkit were submitted to the 2018 NIST speaker recognition evaluation. As back-end, these representations used a standard PLDA classifier with ad-hoc configurations for each system and in-domain adaptation. The results obtained in the NIST SRE 2018 show that our methods are very promising and it is worth continuing to work on them to improve their performance.


 DOI: 10.21437/Interspeech.2019-2417

Cite as: Viñals, I., Ribas, D., Mingote, V., Llombart, J., Gimeno, P., Miguel, A., Ortega, A., Lleida, E. (2019) Phonetically-Aware Embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention Models for the 2018 NIST Speaker Recognition Evaluation. Proc. Interspeech 2019, 4310-4314, DOI: 10.21437/Interspeech.2019-2417.


@inproceedings{Viñals2019,
  author={Ignacio Viñals and Dayana Ribas and Victoria Mingote and Jorge Llombart and Pablo Gimeno and Antonio Miguel and Alfonso Ortega and Eduardo Lleida},
  title={{Phonetically-Aware Embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention Models for the 2018 NIST Speaker Recognition Evaluation}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4310--4314},
  doi={10.21437/Interspeech.2019-2417},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2417}
}