Since 1996 the National Institute of Standards and Technologies has coordinated a series of annual or bi-annual open evaluations of automatic speaker recognition technology. These have concentrated on the task of single speaker detection in the context of spontaneous speech of a conversational telephone or one-on-one interview situation, recorded over ordinary telephone channels or room microphones. System performance has been assessed in relation to a variety of factors, including notably the quantity of training and test speech supplied, the speech styles being used, and the types and variability of the recording channels. While English has been the primary language employed, several of the evaluations have included substantial quantities of speech by multi-lingual speakers to allow examination of language and cross-language effects. More recently, initial efforts have been made to consider the effects of voice aging and varying vocal effort on performance. We discuss the considerations that have gone into planning and organizing these and a few related evaluations, the performance metrics that have been employed, the considerable progress observed over time, and the ongoing plans for further evaluation in 2012 and beyond.
Cite as: Martin, A. (2012) The NIST speaker recognition evaluations. Proc. The Speaker and Language Recognition Workshop (Odyssey 2012), (abstract)
@inproceedings{martin12_odyssey, author={Alvin Martin}, title={{The NIST speaker recognition evaluations}}, year=2012, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2012)}, pages={(abstract)} }