Variational Domain Adversarial Learning for Speaker Verification

Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien

Domain mismatch refers to the problem in which the distribution of training data differs from that of the test data. This paper proposes a variational domain adversarial neural network (VDANN), which consists of a variational autoencoder (VAE) and a domain adversarial neural network (DANN), to reduce domain mismatch. The DANN part aims to retain speaker identity information and learn a feature space that is robust against domain mismatch, while the VAE part is to impose variational regularization on the learned features so that they follow a Gaussian distribution. Thus, the representation produced by VDANN is not only speaker discriminative and domain-invariant but also Gaussian distributed, which is essential for the standard PLDA backend. Experiments on both SRE16 and SRE18-CMN2 show that VDANN outperforms the Kaldi baseline and the standard DANN. The results also suggest that VAE regularization is effective for domain adaptation.

 DOI: 10.21437/Interspeech.2019-2168

Cite as: Tu, Y., Mak, M., Chien, J. (2019) Variational Domain Adversarial Learning for Speaker Verification. Proc. Interspeech 2019, 4315-4319, DOI: 10.21437/Interspeech.2019-2168.

  author={Youzhi Tu and Man-Wai Mak and Jen-Tzung Chien},
  title={{Variational Domain Adversarial Learning for Speaker Verification}},
  booktitle={Proc. Interspeech 2019},