Improved Speaker-Dependent Separation for CHiME-5 Challenge

Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu

This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.

 DOI: 10.21437/Interspeech.2019-1569

Cite as: Wu, J., Xu, Y., Zhang, S., Chen, L., Yu, M., Xie, L., Yu, D. (2019) Improved Speaker-Dependent Separation for CHiME-5 Challenge. Proc. Interspeech 2019, 466-470, DOI: 10.21437/Interspeech.2019-1569.

  author={Jian Wu and Yong Xu and Shi-Xiong Zhang and Lianwu Chen and Meng Yu and Lei Xie and Dong Yu},
  title={{Improved Speaker-Dependent Separation for CHiME-5 Challenge}},
  booktitle={Proc. Interspeech 2019},