ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Multi-Channel Speaker Verification for Single and Multi-Talker Speech

Saurabh Kataria, Shi-Xiong Zhang, Dong Yu

To improve speaker verification in real scenarios with interference speakers, noise, and reverberation, we propose to bring together advancements made in multi-channel speech features. Specifically, we combine spectral, spatial, and directional features, which includes inter-channel phase difference, multichannel sinc convolutions, directional power ratio features, and angle features. To maximally leverage supervised learning, our framework is also equipped with multi-channel speech enhancement and voice activity detection. On all simulated, replayed, and real recordings, we observe large and consistent improvements at various degradation levels. On real recordings of multi-talker speech, we achieve a 36% relative reduction in equal error rate w.r.t. single-channel baseline. We find the improvements from speaker-dependent directional features more consistent in multi-talker conditions than clean. Lastly, we investigate if the learned multi-channel speaker embedding space can be made more discriminative through a contrastive loss-based fine-tuning. With a simple choice of Triplet loss, we observe a further 8.3% relative reduction in EER.


doi: 10.21437/Interspeech.2021-681

Cite as: Kataria, S., Zhang, S.-X., Yu, D. (2021) Multi-Channel Speaker Verification for Single and Multi-Talker Speech. Proc. Interspeech 2021, 4608-4612, doi: 10.21437/Interspeech.2021-681

@inproceedings{kataria21b_interspeech,
  author={Saurabh Kataria and Shi-Xiong Zhang and Dong Yu},
  title={{Multi-Channel Speaker Verification for Single and Multi-Talker Speech}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={4608--4612},
  doi={10.21437/Interspeech.2021-681}
}