Environment-Dependent Attention-Driven Recurrent Convolutional Neural Network for Robust Speech Enhancement

Meng Ge, Longbiao Wang, Nan Li, Hao Shi, Jianwu Dang, Xiangang Li


Speech enhancement aims to keep the real speech signal and reduce noise for building robust communication systems. Under the success of DNN, significant progress has been made. Nevertheless, accuracy of the speech enhancement system is not satisfactory due to insufficient consideration of varied environmental and contextual information in complex cases. To address these problems, this research proposes an end-to-end environment-dependent attention-driven approach. The local frequency-temporal pattern via convolutional neural network is fully employed without pooling operation. It then integrates an attention mechanism into bidirectional long short-term memory to acquire the weighted dynamic context between consecutive frames. Furthermore, dynamic environment estimation and phase correction further improve the generalization ability. Extensive experimental results on REVERB challenge demonstrated that the proposed approach outperformed existing methods, improving PESQ from 2.56 to 2.87 and SRMR from 4.95 to 5.50 compared with conventional DNN.


 DOI: 10.21437/Interspeech.2019-1477

Cite as: Ge, M., Wang, L., Li, N., Shi, H., Dang, J., Li, X. (2019) Environment-Dependent Attention-Driven Recurrent Convolutional Neural Network for Robust Speech Enhancement. Proc. Interspeech 2019, 3153-3157, DOI: 10.21437/Interspeech.2019-1477.


@inproceedings{Ge2019,
  author={Meng Ge and Longbiao Wang and Nan Li and Hao Shi and Jianwu Dang and Xiangang Li},
  title={{Environment-Dependent Attention-Driven Recurrent Convolutional Neural Network for Robust Speech Enhancement}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3153--3157},
  doi={10.21437/Interspeech.2019-1477},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1477}
}