The 2018 NIST Speaker Recognition Evaluation

Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Douglas Reynolds, Lisa Mason, Jaime Hernandez-Cordero

In 2018, the U.S. National Institute of Standards and Technology (NIST) conducted the most recent in an ongoing series of speaker recognition evaluations (SRE). SRE18 was organized in a similar manner to SRE16, focusing on speaker detection over conversational telephony speech (CTS) collected outside north America. SRE18 also featured several new aspects including: two new data domains, namely voice over internet protocol (VoIP) and audio extracted from amateur online videos (AfV), as well as a new language (Tunisian Arabic). A total of 78 organizations (forming 48 teams) from academia and industry participated in SRE18 and submitted 129 valid system outputs under fixed and open training conditions first introduced in SRE16. This paper presents an overview of the evaluation and several analyses of system performance for all primary conditions in SRE18. The evaluation results suggest that 1) speaker recognition on AfV was more challenging than on telephony data, 2) speaker representations (aka embeddings) extracted using end-to-end neural network frameworks were most effective, 3) top performing systems exhibited similar performance, and 4) greatest performance improvements were largely due to data augmentation, use of extended and more complex models for data representation, as well as effective use of the provided development sets.

 DOI: 10.21437/Interspeech.2019-1351

Cite as: Sadjadi, S.O., Greenberg, C., Singer, E., Reynolds, D., Mason, L., Hernandez-Cordero, J. (2019) The 2018 NIST Speaker Recognition Evaluation. Proc. Interspeech 2019, 1483-1487, DOI: 10.21437/Interspeech.2019-1351.

  author={Seyed Omid Sadjadi and Craig Greenberg and Elliot Singer and Douglas Reynolds and Lisa Mason and Jaime Hernandez-Cordero},
  title={{The 2018 NIST Speaker Recognition Evaluation}},
  booktitle={Proc. Interspeech 2019},