Parallel Speaker and Content Modelling for Text-Dependent Speaker Verification

Jianbo Ma, Saad Irtza, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah


Text-dependent short duration speaker verification involves two challenges. The primary challenge of interest is the verification of the speaker’s identity, and often a secondary challenge of interest is the verification of the lexical content of the pass-phrase. In this paper, we propose the use of two systems to handle these two tasks in parallel with one sub-system modelling speaker identity based on the assumption that lexical content is known and the other sub-system modelling lexical content in a speaker dependent manner. The text-dependent speaker verification sub-system is based on hidden Markov models and the lexical content verification system is based on models of speech segments that use a distinct Gaussian mixture model for each segment. Furthermore, a mixture selection method based on KL divergence was applied to refine the lexical content sub-system by making the models more discriminative. Experiments on part 1 of the RedDots database showed that the proposed combination of two sub-systems outperformed the baseline system by 39.8%, 51.1% and 37.3% in terms of the ‘imposter_correct’, ‘target_wrong’ and ‘imposter_wrong’ metrics respectively.


DOI: 10.21437/Interspeech.2016-825

Cite as

Ma, J., Irtza, S., Sriskandaraja, K., Sethu, V., Ambikairajah, E. (2016) Parallel Speaker and Content Modelling for Text-Dependent Speaker Verification. Proc. Interspeech 2016, 435-439.

Bibtex
@inproceedings{Ma+2016,
author={Jianbo Ma and Saad Irtza and Kaavya Sriskandaraja and Vidhyasaharan Sethu and Eliathamby Ambikairajah},
title={Parallel Speaker and Content Modelling for Text-Dependent Speaker Verification},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-825},
url={http://dx.doi.org/10.21437/Interspeech.2016-825},
pages={435--439}
}