ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III

Weiqing Wang, Danwei Cai, Jin Wang, Qingjian Lin, Xuyang Wang, Mi Hong, Ming Li

This paper describes the systems developed by the DKU-Duke-Lenovo team for the Fearless Steps Challenge Phase III. For the speech activity detection (SAD) task, we employ the U-Net-based model which has not been used for SAD before, observing a DCF of 1.915% on the eval set. For the speaker identification (SID) task, we adopt the ResNet-SE and ECAPA-TDNN model, and we obtain a Top-5 accuracy of 86.21%. For the speaker diarization (SD) task, we employ several different clustering methods. Besides, domain adaptation, system fusion, and Target-Speaker Voice Activity Detection (TS-VAD) significantly improve the SD performance. We obtain a DER of 12.32% on track 2, and the major contribution is from our ResNet-based TS-VAD model. We finally achieve a first-place ranking for SD and SID and a second-place for SAD in the challenge.


doi: 10.21437/Interspeech.2021-235

Cite as: Wang, W., Cai, D., Wang, J., Lin, Q., Wang, X., Hong, M., Li, M. (2021) The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III. Proc. Interspeech 2021, 1044-1048, doi: 10.21437/Interspeech.2021-235

@inproceedings{wang21i_interspeech,
  author={Weiqing Wang and Danwei Cai and Jin Wang and Qingjian Lin and Xuyang Wang and Mi Hong and Ming Li},
  title={{The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1044--1048},
  doi={10.21437/Interspeech.2021-235}
}