Subspace based techniques, such as i-vector and Joint Factor Analysis (JFA) have shown to provide state-of-the-art performance for fixed phrase based text-dependent speaker verification. However, the error rates of such systems on the random digit task of RSR dataset are higher than that of Gaussian Mixture Model-Universal Background Model (GMM-UBM). In this paper, we aim at improving i-vector system by normalizing the content of the enrollment data to match the test data. We estimate i-vectors for each frames of a speech utterance (also called online i-vectors). The largest similarity scores across frames between enrollment and test are taken using these online i-vectors to obtain speaker verification scores. Experiments on Part3 of RSR corpora show that the proposed approach achieves 12% relative improvement in equal error rate over a GMM-UBM based baseline system.
Cite as: Dey, S., Madikeri, S., Motlicek, P., Ferras, M. (2017) Content Normalization for Text-Dependent Speaker Verification. Proc. Interspeech 2017, 1482-1486, doi: 10.21437/Interspeech.2017-1419
@inproceedings{dey17_interspeech, author={Subhadeep Dey and Srikanth Madikeri and Petr Motlicek and Marc Ferras}, title={{Content Normalization for Text-Dependent Speaker Verification}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1482--1486}, doi={10.21437/Interspeech.2017-1419} }