ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Variational Information Bottleneck Based Regularization for Speaker Recognition

Dan Wang, Yuanjie Dong, Yaxing Li, Yunfei Zi, Zhihui Zhang, Xiaoqi Li, Shengwu Xiong

Speaker recognition (SR) is inevitably affected by noise in real-life scenarios, resulting in decreased recognition accuracy. In this paper, we introduce a novel regularization method, variable information bottleneck (VIB), in speaker recognition to extract robust speaker embeddings. VIB prompts the neural network to ignore as much speaker-identity irrelevant information as possible. We also propose a more effective network, VovNet with an ultra-lightweight subspace attention module (ULSAM), as a feature extractor. ULSAM infers different attention maps for each feature map subspace, enabling efficient learning of cross-channel information along with multi-scale and multi-frequency feature representation. The experimental results demonstrate that our proposed framework outperforms the ResNet-based baseline by 11.4% in terms of equal error rate (EER). The VIB regularization method gives a further performance boost with an 18.9% EER decrease.


doi: 10.21437/Interspeech.2021-482

Cite as: Wang, D., Dong, Y., Li, Y., Zi, Y., Zhang, Z., Li, X., Xiong, S. (2021) Variational Information Bottleneck Based Regularization for Speaker Recognition. Proc. Interspeech 2021, 1054-1058, doi: 10.21437/Interspeech.2021-482

@inproceedings{wang21j_interspeech,
  author={Dan Wang and Yuanjie Dong and Yaxing Li and Yunfei Zi and Zhihui Zhang and Xiaoqi Li and Shengwu Xiong},
  title={{Variational Information Bottleneck Based Regularization for Speaker Recognition}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1054--1058},
  doi={10.21437/Interspeech.2021-482}
}