Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation

Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su, Dong Yu


We explore generative adversarial networks (GANs) for speech separation, particularly with permutation invariant training (SSGAN-PIT). Prior work demonstrates that GANs can be implemented for suppressing additive noise in noisy speech waveform and improving perceptual speech quality. In this work, we train GANs for speech separation which enhances multiple speech sources simultaneously with the permutation issue addressed by the utterance level PIT in the training of the generator network. We propose operating GANs on the power spectrum domain instead of waveforms to reduce computation. To better explore time dependencies, recurrent neural networks (RNNs) with long short-term memory (LSTM) are adopted for both generator and discriminator in this study. We evaluated SSGAN-PIT on the WSJ0 two-talker mixed speech separation task and found that SSGAN-PIT outperforms SSGAN without PIT and the neural networks based speech separation with or without PIT. The evaluation confirms the feasibility of the proposed model and training approach for efficient speech separation. The convergence behavior of permutation invariant training and adversarial training are analyzed.


 DOI: 10.21437/Interspeech.2018-1603

Cite as: Chen, L., Yu, M., Qian, Y., Su, D., Yu, D. (2018) Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation. Proc. Interspeech 2018, 302-306, DOI: 10.21437/Interspeech.2018-1603.


@inproceedings{Chen2018,
  author={Lianwu Chen and Meng Yu and Yanmin Qian and Dan Su and Dong Yu},
  title={Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={302--306},
  doi={10.21437/Interspeech.2018-1603},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1603}
}