Compensation for phonetic nuisance variability in speaker recognition using DNNs

Themos Stafylakis, Patrick Kenny, Vishwa Gupta, Jahangir Alam, Marcel Kockmann


In this paper, a new way of using phonetic DNN in text-independent speaker recognition is examined. Inspired by the Subspace GMM approach to speech recognition, we try to extract i-vectors that are invariant to the phonetic content for the utterance. We overcome the assumption of gaussian distributed senones by combining DNN with UBM posteriors and we form a complete EM algorithm for training and extracting phonetic content compensated i-vectors. A simplified version of the model is also presented, where the phonetic content and speaker subspaces are learned in a decoupled way. Covariance adaptation is also examined, where the covariance matrices are reestimated rather than copied from the UBM. A set of primary experimental results is reported on NIST-SRE 2010, with modest improvement when fused with the standard i-vectors.


DOI: 10.21437/Odyssey.2016-49

Cite as

Stafylakis, T., Kenny, P., Gupta, V., Alam, J., Kockmann, M. (2016) Compensation for phonetic nuisance variability in speaker recognition using DNNs. Proc. Odyssey 2016, 340-345.

Bibtex
@inproceedings{Stafylakis+2016,
author={Themos Stafylakis and Patrick Kenny and Vishwa Gupta and Jahangir Alam and Marcel Kockmann},
title={Compensation for phonetic nuisance variability in speaker recognition using DNNs},
year=2016,
booktitle={Odyssey 2016},
doi={10.21437/Odyssey.2016-49},
url={http://dx.doi.org/10.21437/Odyssey.2016-49},
pages={340--345}
}