Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification

Pierre-Michel Bousquet, Mickael Rouvier


Duration mismatch between enrollment and test utterances still remains a major concern for reliability of real-life speaker recognition applications. Two approaches are proposed here to deal with this case when using the i-vector representation. The first one is an adaptation of Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling, which can be extended to the case of any shift between i-vectors drawn from two distinct distributions. The second one attempts to map i-vectors of truncated segments of an utterance to the i-vector of the full segment, by the use of deep neural networks (DNN). Our results show that both new approaches outperform the standard PLDA by about 10% relative, noting that these back-end methods could complement those quantifying the i-vector uncertainty during its extraction process, in the case of duration gap.


 DOI: 10.21437/Interspeech.2017-93

Cite as: Bousquet, P., Rouvier, M. (2017) Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification. Proc. Interspeech 2017, 1547-1551, DOI: 10.21437/Interspeech.2017-93.


@inproceedings{Bousquet2017,
  author={Pierre-Michel Bousquet and Mickael Rouvier},
  title={Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1547--1551},
  doi={10.21437/Interspeech.2017-93},
  url={http://dx.doi.org/10.21437/Interspeech.2017-93}
}