Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification

Md Hafizur Rahman, Ivan Himawan, David Dean, Clinton Fookes, Sridha Sridharan


The performance of the current state-of-the-art i-vector based probabilistic linear discriminant analysis (PLDA) speaker verification depends on large volumes of training data, ideally in the target domain. However, in real-world applications, it is often difficult to collect sufficient amount of target domain data for successful PLDA training. Thus, an adequate amount of domain mismatch compensated out-domain data must be used as the basis of PLDA training. In this paper, we introduce a domain-invariant i-vector extraction (DI-IVEC) approach to extract domain mismatch compensated out-domain i-vectors using limited in-domain (target) data for adaptation. In this method, in-domain prior information is utilised to remove the domain mismatch during the i-vector extraction stage. The proposed method provides at least 17.3% improvement in EER over an out-domain-only trained baseline when speaker labels are absent and a 27.2% improvement in EER when speaker labels are known. A further improvement is obtained when DI-IVEC approach is used in combination with a domain-invariant covariance normalization (DICN) approach. This combined approach is found to work well with reduced in-domain adaptation data, where only 1000 unlabelled i-vectors are required to perform better than a baseline in-domain PLDA approach.


 DOI: 10.21437/Odyssey.2018-22

Cite as: Rahman, M.H., Himawan, I., Dean, D., Fookes, C., Sridharan, S. (2018) Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 155-161, DOI: 10.21437/Odyssey.2018-22.


@inproceedings{Rahman2018,
  author={Md Hafizur Rahman and Ivan Himawan and David Dean and Clinton Fookes and Sridha Sridharan},
  title={Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification	},
  year=2018,
  booktitle={Proc. Odyssey 2018 The Speaker and Language Recognition Workshop},
  pages={155--161},
  doi={10.21437/Odyssey.2018-22},
  url={http://dx.doi.org/10.21437/Odyssey.2018-22}
}