VAE-Based Regularization for Deep Speaker Embedding

Yang Zhang, Lantian Li, Dong Wang


Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called ‘x-vectors’) are not Gaussian, causing performance degradation with the famous PLDA back-end scoring. In this paper, we propose a regularization approach based on Variational Auto-Encoder (VAE). This model transforms x-vectors to a latent space where mapped latent codes are more Gaussian, hence more suitable for PLDA scoring.


 DOI: 10.21437/Interspeech.2019-2486

Cite as: Zhang, Y., Li, L., Wang, D. (2019) VAE-Based Regularization for Deep Speaker Embedding. Proc. Interspeech 2019, 4020-4024, DOI: 10.21437/Interspeech.2019-2486.


@inproceedings{Zhang2019,
  author={Yang Zhang and Lantian Li and Dong Wang},
  title={{VAE-Based Regularization for Deep Speaker Embedding}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4020--4024},
  doi={10.21437/Interspeech.2019-2486},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2486}
}