We tackle the problem of text-dependent speaker verification using a version of Joint Factor Analysis (JFA) in which speaker-phrase variability is modeled with a factorial prior and channel variability with a subspace prior. We implemented this using Zhao and Dong's variational Bayes algorithm, an extension of Vogt's Gauss-Seidel method that supports UBM adaptation to the speaker and channel effects in enrollment and test utterances. We report results on the RSR2015 dataset obtained with two types of likelihood ratio and several strategies for UBM adaptation. We found that using a large UBM and decomposing JFA into a feature extractor and a simple back end classifier (in a way broadly analogous to the i-vector/PLDA cascade) gives better results than using likelihood ratios of either type to make verification decisions. This method involves no UBM adaptation other than to the lexical content of utterances and it is based on Vogt's algorithm rather than Zhao and Dong's. It results in an equal error rate of 0.5\% on the RSR2015 evaluation set.
Cite as: Kenny, P., Stafylakis, T., Jahangir, A., Ouellet, P., Kockmann, M. (2014) Joint Factor Analysis for Text-Dependent Speaker Verification. Proc. The Speaker and Language Recognition Workshop (Odyssey 2014), 200-207, doi: 10.21437/Odyssey.2014-31
@inproceedings{kenny14b_odyssey, author={Patrick Kenny and Themos Stafylakis and Alam Jahangir and Pierre Ouellet and Marcel Kockmann}, title={{Joint Factor Analysis for Text-Dependent Speaker Verification}}, year=2014, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2014)}, pages={200--207}, doi={10.21437/Odyssey.2014-31} }