In this paper, a new way of using phonetic DNN in text-independent speaker recognition is examined. Inspired by the Subspace GMM approach to speech recognition, we try to extract i-vectors that are invariant to the phonetic content for the utterance. We overcome the assumption of gaussian distributed senones by combining DNN with UBM posteriors and we form a complete EM algorithm for training and extracting phonetic content compensated i-vectors. A simplified version of the model is also presented, where the phonetic content and speaker subspaces are learned in a decoupled way. Covariance adaptation is also examined, where the covariance matrices are reestimated rather than copied from the UBM. A set of primary experimental results is reported on NIST-SRE 2010, with modest improvement when fused with the standard i-vectors.
Stafylakis, T., Kenny, P., Gupta, V., Alam, J., Kockmann, M. (2016) Compensation for phonetic nuisance variability in speaker recognition using DNNs. Proc. Odyssey 2016, 340-345.
@inproceedings{Stafylakis+2016, author={Themos Stafylakis and Patrick Kenny and Vishwa Gupta and Jahangir Alam and Marcel Kockmann}, title={Compensation for phonetic nuisance variability in speaker recognition using DNNs}, year=2016, booktitle={Odyssey 2016}, doi={10.21437/Odyssey.2016-49}, url={http://dx.doi.org/10.21437/Odyssey.2016-49}, pages={340--345} }