We present an effective, practical solution to the problem of uncertainty modeling in text-dependent speaker recognition where ``uncertainty'' refers to the fact that feature vectors used for speaker recognition are necessarily noisy in the statistical sense if they are extracted from utterances of short duration. The idea is to apply the I-Vector Backend probability model at the level of individual Gaussian mixture components rather than at the supervector level. We show that (unlike the I-Vector Backend), this approach can be implemented in a way which makes reasonable computational demands at verification time. Uncertainty modeling enables us to achieve error rate reductions of up to 25% on the RSR Part III speaker verification task (compared to an implementation of the Joint Density Backend [8] which treats point estimates of supervector features as being reliable).

Cite as

Kenny, P., Stafylakis, T., Alam, J., Gupta, V., Kockmann, M. (2016) Uncertainty Modeling Without Subspace Methods For Text-Dependent Speaker Recognition. Proc. Odyssey 2016, 16-23.

Bibtex

@inproceedings{Kenny+2016, author={Patrick Kenny and Themos Stafylakis and Jahangir Alam and Vishwa Gupta and Marcel Kockmann}, title={Uncertainty Modeling Without Subspace Methods For Text-Dependent Speaker Recognition}, year=2016, booktitle={Odyssey 2016}, doi={10.21437/Odyssey.2016-3}, url={http://dx.doi.org/10.21437/Odyssey.2016-3}, pages={16--23} }