This paper describes the system developed by Intelligent Voice for IberSpeech 2022 Albayzin Evaluations Speaker Diarization and Identity Assignment Challenge (SDIAC). The presented Variational Bayes x-vector Voice Print Extraction (VBxVPE) system is capable of capturing the vocal variations using multiple x-vector representations with two-stage clustering and outlier detection refinement and implements Deep-Encoder Convolutional Autoencoder Denoiser (DE-CADE) network for denoising segments with noise and music for robust speaker recognition and diarization. When evaluated against the Radiotelevision Espanola (RTVE) 2022 evaluation dataset, the system was able to obtain a Diarization Error Rate (DER) of 37.2% for the Speaker Diarization and Identity Assignment task and 44.34% for the Speaker Diarization only tasks.
Cite as: Shrestha, R., Glackin, C., Wall, J., Cannings, N. (2022) Intelligent Voice Speaker Recognition and Diarization System for IberSpeech 2022 Albayzin Evaluations Speaker Diarization and Identity Assignment Challenge. Proc. IberSPEECH 2022, 281-283
@inproceedings{shrestha22_iberspeech, author={Roman Shrestha and Cornelius Glackin and Julie Wall and Nigel Cannings}, title={{Intelligent Voice Speaker Recognition and Diarization System for IberSpeech 2022 Albayzin Evaluations Speaker Diarization and Identity Assignment Challenge}}, year=2022, booktitle={Proc. IberSPEECH 2022}, pages={281--283} }