Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation

Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han, Anyan Shi


Monaural speech separation techniques are far from satisfactory and are a challenging task due to interference from multiple sources. Recently the deep dilated temporal convolutional networks (TCN) has proven to be very effective in sequence modeling. This work explores how to extend TCN to result a new, state-of-the-art monaural speech separation method. First, a new gating mechanism is introduced and added to generate a gated TCN. The gated activation controls the flow of information. Further in order to combine multiple training models to reduce the performance variance and improve the effect of speech separation, we propose to use the principle of ensemble learning in the gated TCN architecture by replacing the convolutional modules corresponding to each dilated factor with multiple identical branches of the convolutional components. For the sake of objectivity, we propose to train the network by directly optimizing in a permutation invariant training (PIT) style of the utterance level signal-to-distortion ratio (SDR). Our experiments with the public WSJ0-2mix data corpus resulted in an 18.2 dB improvement in SDR, indicating that our proposed network can improve the performance of speaker separation tasks.


 DOI: 10.21437/Interspeech.2019-1373

Cite as: Shi, Z., Lin, H., Liu, L., Liu, R., Han, J., Shi, A. (2019) Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation. Proc. Interspeech 2019, 3183-3187, DOI: 10.21437/Interspeech.2019-1373.


@inproceedings{Shi2019,
  author={Ziqiang Shi and Huibin Lin and Liu Liu and Rujie Liu and Jiqing Han and Anyan Shi},
  title={{Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3183--3187},
  doi={10.21437/Interspeech.2019-1373},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1373}
}