Speaker-Aware Monaural Speech Separation

Jiahao Xu, Kun Hu, Chang Xu, Duc Chung Tran, Zhiyong Wang


Predicting and applying Time-Frequency (T-F) masks on mixture signals have been successfully utilized for speech separation. However, existing studies have not well utilized the identity context of a speaker for the inference of masks. In this paper, we propose a novel speaker-aware monaural speech separation model. We firstly devise an encoder to disentangle speaker identity information with the supervision from the auxiliary speaker verification task. Then, we develop a spectrogram masking network to predict speaker masks, which would be applied to the mixture signal for the reconstruction of source signals. Experimental results on two WSJ0 mixed datasets demonstrate that our proposed model outperforms existing models in different separation scenarios.


 DOI: 10.21437/Interspeech.2020-2483

Cite as: Xu, J., Hu, K., Xu, C., Tran, D.C., Wang, Z. (2020) Speaker-Aware Monaural Speech Separation. Proc. Interspeech 2020, 1451-1455, DOI: 10.21437/Interspeech.2020-2483.


@inproceedings{Xu2020,
  author={Jiahao Xu and Kun Hu and Chang Xu and Duc Chung Tran and Zhiyong Wang},
  title={{Speaker-Aware Monaural Speech Separation}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1451--1455},
  doi={10.21437/Interspeech.2020-2483},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2483}
}