Duration mismatch between enrollment and test utterances still remains a major concern for reliability of real-life speaker recognition applications. Two approaches are proposed here to deal with this case when using the i-vector representation. The first one is an adaptation of Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling, which can be extended to the case of any shift between i-vectors drawn from two distinct distributions. The second one attempts to map i-vectors of truncated segments of an utterance to the i-vector of the full segment, by the use of deep neural networks (DNN). Our results show that both new approaches outperform the standard PLDA by about 10% relative, noting that these back-end methods could complement those quantifying the i-vector uncertainty during its extraction process, in the case of duration gap.
Cite as: Bousquet, P.-M., Rouvier, M. (2017) Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification. Proc. Interspeech 2017, 1547-1551, doi: 10.21437/Interspeech.2017-93
@inproceedings{bousquet17_interspeech, author={Pierre-Michel Bousquet and Mickael Rouvier}, title={{Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1547--1551}, doi={10.21437/Interspeech.2017-93} }