ISCA Archive Odyssey 2022
ISCA Archive Odyssey 2022

STC Speaker Recognition System for the NIST SRE 2021

Galina Lavrentyeva, Sergey Novoselov, Vladimir Volokhov, Anastasia Avdeeva, Aleksei Gusev, Alisa Vinogradova, Igor Korsunov, Alexander Kozlov, Timur Pekhovsky, Andrey Shulipa, Evgeny Smirnov, Vasily Galyuk

The 2021 Speaker Recognition Evaluation (SRE21) is the next of an open speaker recognition evaluations conducted by the US National Institute of Standards and Technology (NIST). In 2021 the challenge was focused on person detection over conversational telephone speech (CTS) and audio from video. It introduces new cross-channel and multilingual speaker recognition tasks by providing diverse evaluation corpus. This paper summarizes STC Ltd. single systems developed during the NIST 2021 Speaker Recognition Evaluation for both fixed and open training conditions. During the NIST SRE21 we were focused on the training of the state-of-the-art deep speaker embeddings extractors like ResNets and ECAPA networks by using additive angular margin based loss functions. Additionally, inspired by the recent success of the wav2vec 2.0 features in automatic speech recognition we explored the effectiveness of this approach in the speaker verification field. According to our observations the fine-tuning of the pretrained large wav2vec 2.0 model provides our best performing systems for open track conditions. Our experiments with wav2vec 2.0 based extractors for the fixed track showed that unsupervised autoregressive pretraining with Contrastive Predictive Coding loss opens the door to training powerful transformer-based extractors from raw speech signals. For video modality we developed our best solution with RetinaFace face detector and deep ResNet face embeddings extractor trained on large face image datasets. We note that our single and fusion systems demonstrated strong performance in the SRE21.

doi: 10.21437/Odyssey.2022-49

Cite as: Lavrentyeva, G., Novoselov, S., Volokhov, V., Avdeeva, A., Gusev, A., Vinogradova, A., Korsunov, I., Kozlov, A., Pekhovsky, T., Shulipa, A., Smirnov, E., Galyuk, V. (2022) STC Speaker Recognition System for the NIST SRE 2021. Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 354-361, doi: 10.21437/Odyssey.2022-49

  author={Galina Lavrentyeva and Sergey Novoselov and Vladimir Volokhov and Anastasia Avdeeva and Aleksei Gusev and Alisa Vinogradova and Igor Korsunov and Alexander Kozlov and Timur Pekhovsky and Andrey Shulipa and Evgeny Smirnov and Vasily Galyuk},
  title={{STC Speaker Recognition System for the NIST SRE 2021}},
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2022)},