ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Scenario-Dependent Speaker Diarization for DIHARD-III Challenge

Yu-Xuan Wang, Jun Du, Maokui He, Shu-Tong Niu, Lei Sun, Chin-Hui Lee

In this study, we propose a scenario-dependent speaker diarization approach to handling the diversified scenarios of 11 domains encountered in DIHARD-III challenge with a divide-and-conquer strategy. First, using a ResNet-based audio domain classifier, all domains in DIHARD-III challenge could be divided into several scenarios by different impact factors, such as background noise level, speaker number, and speaker overlap ratio. In each scenario, different combinations of techniques are designed, aiming at achieving the best performance in terms of both diarization error rate (DER) and run-time efficiency. For low signal-to-noise-ration (SNR) scenarios, speech enhancement based on a progressive learning network with multiple intermediate SNR targets is adopted for pre-processing. Conventional clustering-based speaker diarization is utilized to mainly handle speech segments with non-overlapping speakers, while separation-based or neural speaker diarization is used to cope with the overlapping speech regions, which is combined with an iterative fine-tuning strategy to boost the generalization ability. We also explore post-processing to perform system fusion and selection. For DIHARD-III challenge, our scenario-dependent system won the first place among all submitted systems, and significantly outperforms the state-of-the-art clustering-based speaker diarization system, yielding relative DER reductions of 32.17% and 28.34% on development set and evaluation set on Track 1, respectively.


doi: 10.21437/Interspeech.2021-516

Cite as: Wang, Y.-X., Du, J., He, M., Niu, S.-T., Sun, L., Lee, C.-H. (2021) Scenario-Dependent Speaker Diarization for DIHARD-III Challenge. Proc. Interspeech 2021, 3106-3110, doi: 10.21437/Interspeech.2021-516

@inproceedings{wang21y_interspeech,
  author={Yu-Xuan Wang and Jun Du and Maokui He and Shu-Tong Niu and Lei Sun and Chin-Hui Lee},
  title={{Scenario-Dependent Speaker Diarization for DIHARD-III Challenge}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3106--3110},
  doi={10.21437/Interspeech.2021-516}
}