In this paper, we propose a novel speaker verification method which determines whether a claimer is accepted or rejected by the rank of the claimer in a large number of speaker models instead of score normalization, such as T-norm and Z-norm. The method has advantages over the standard T-norm in speaker verification accuracy. However, it needs much computation time as well as T-norm that needs calculating likelihoods for many cohort models. Hence, we also discuss the speed-up using the method that selects cohort subset for each target speaker in the training stage. This data driven approach can significantly reduce computation resulting in faster speaker verification decision. We conducted text-independent speaker verification experiments using large-scale Japanese speaker recognition evaluation corpus constructed by National Research Institute of Police Science. As a result, the proposed method achieved an equal error rate of 2.2%, while T-norm obtained 2.7%.
Cite as: Okamoto, H., Tsuge, S., Abdelwahab, A., Nishida, M., Horiuchi, Y., Kuroiwa, S. (2009) Text-independent speaker verification using rank threshold in large number of speaker models. Proc. Interspeech 2009, 2367-2370, doi: 10.21437/Interspeech.2009-400
@inproceedings{okamoto09_interspeech, author={Haruka Okamoto and Satoru Tsuge and Amira Abdelwahab and Masafumi Nishida and Yasuo Horiuchi and Shingo Kuroiwa}, title={{Text-independent speaker verification using rank threshold in large number of speaker models}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={2367--2370}, doi={10.21437/Interspeech.2009-400} }