End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network

Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa, Shouji Harada, Jiqing Han


The monaural speech separation technology is far from satisfactory and has been a challenging task due to the interference of multiple sound sources. While deep dilated temporal convolutional networks (TCN) have been proved to be very effective in sequence modeling, this work investigates how to extend TCN to result in a new state-of-the-art approach for monaural speech separation. First a novel gating mechanisms is introduced and added to result in gated TCN. The gated activation can control the flow of information. Further in order to remedy the temporal scale variation problem caused by word length and pronunciation characteristics of different people, a multi-scale dynamic weighted pyramids gated TCNs is proposed, where a “weightor” network is used to determine the weights of different gated TCNs dynamically for each utterance. Since the strengths of different branches with different temporal receipt fields appear complementary, the combination outperforms single branch system. For the objective, we propose to train the network by directly optimizing utterance level signal-to-distortion ratio (SDR) in a permutation invariant training (PIT) style. Our experiments on the the WSJ0-2mix data corpus results in 18.4dB SDR improvement, which shows our proposed networks can leads to performance improvement on the speaker separation task.


 DOI: 10.21437/Interspeech.2019-1292

Cite as: Shi, Z., Lin, H., Liu, L., Liu, R., Hayakawa, S., Harada, S., Han, J. (2019) End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network. Proc. Interspeech 2019, 4614-4618, DOI: 10.21437/Interspeech.2019-1292.


@inproceedings{Shi2019,
  author={Ziqiang Shi and Huibin Lin and Liu Liu and Rujie Liu and Shoji Hayakawa and Shouji Harada and Jiqing Han},
  title={{End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4614--4618},
  doi={10.21437/Interspeech.2019-1292},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1292}
}