Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information

Rongzhi Gu, Lianwu Chen, Shi-Xiong Zhang, Jimeng Zheng, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu


The recent exploration of deep learning for supervised speech separation has significantly accelerated the progress on the multi-talker speech separation problem. The multi-channel approaches have attracted much research attention due to the benefit of spatial information. In this paper, integrated with the power spectra and inter-channel spatial features at the input level, we explore to leverage directional features, which imply the speaker source from the desired target direction, for target speaker separation. In addition, we incorporate an attention mechanism to dynamically tune the model’s attention to the reliable input features to alleviate spatial ambiguity problem when multiple speakers are closely located. We demonstrate, on the far-field WSJ0 2-mix dataset, that our proposed approach significantly improves the performance of speech separation against the baseline single-channel and multi-channel speech separation methods.


 DOI: 10.21437/Interspeech.2019-2266

Cite as: Gu, R., Chen, L., Zhang, S., Zheng, J., Xu, Y., Yu, M., Su, D., Zou, Y., Yu, D. (2019) Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information. Proc. Interspeech 2019, 4290-4294, DOI: 10.21437/Interspeech.2019-2266.


@inproceedings{Gu2019,
  author={Rongzhi Gu and Lianwu Chen and Shi-Xiong Zhang and Jimeng Zheng and Yong Xu and Meng Yu and Dan Su and Yuexian Zou and Dong Yu},
  title={{Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4290--4294},
  doi={10.21437/Interspeech.2019-2266},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2266}
}