Binaural Reverberant Speech Separation Based on Deep Neural Networks

Xueliang Zhang, DeLiang Wang


Supervised learning has exhibited great potential for speech separation in recent years. In this paper, we focus on separating target speech in reverberant conditions from binaural inputs using supervised learning. Specifically, deep neural network (DNN) is constructed to map from both spectral and spatial features to a training target. For spectral features extraction, we first convert binaural inputs into a single signal by applying a fixed beamformer. A new spatial feature is proposed and extracted to complement spectral features. The training target is the recently suggested ideal ratio mask (IRM). Systematic evaluations and comparisons show that the proposed system achieves good separation performance and substantially outperforms existing algorithms under challenging multi-source and reverberant environments.


 DOI: 10.21437/Interspeech.2017-297

Cite as: Zhang, X., Wang, D. (2017) Binaural Reverberant Speech Separation Based on Deep Neural Networks. Proc. Interspeech 2017, 2018-2022, DOI: 10.21437/Interspeech.2017-297.


@inproceedings{Zhang2017,
  author={Xueliang Zhang and DeLiang Wang},
  title={Binaural Reverberant Speech Separation Based on Deep Neural Networks},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2018--2022},
  doi={10.21437/Interspeech.2017-297},
  url={http://dx.doi.org/10.21437/Interspeech.2017-297}
}