ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation

Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

This study presents a framework to enable efficient and stable adversarial learning of unsupervised multichannel source separation models. When the paired data, i.e., the mixture and the corresponding clean speech, are not available for training, it is promising to exploit generative adversarial networks (GANs), where a source separation system is treated as a generator and trained to bring the distribution of the separated (fake) speech closer to that of the clean (real) speech. The separated speech, however, contains many errors, especially when the system is trained unsupervised and can be easily distinguished from the clean speech. A real/fake binary discriminator therefore will stop the adversarial learning process unreasonably early. This study aims to balance the convergence of the generator and discriminator to achieve efficient and stable learning. For that purpose, the autoencoder-based discriminator and more stable adversarial loss, which are designed in boundary equilibrium GAN (BEGAN), are introduced. In addition, generator-specific distortions are added to real examples so that the models can be trained to focus only on source separation. Experimental comparisons demonstrated that the present stabilizing learning techniques improved the performance of multiple unsupervised source separation systems.


doi: 10.21437/Interspeech.2021-523

Cite as: Nakagome, Y., Togami, M., Ogawa, T., Kobayashi, T. (2021) Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation. Proc. Interspeech 2021, 3051-3055, doi: 10.21437/Interspeech.2021-523

@inproceedings{nakagome21_interspeech,
  author={Yu Nakagome and Masahito Togami and Tetsuji Ogawa and Tetsunori Kobayashi},
  title={{Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3051--3055},
  doi={10.21437/Interspeech.2021-523}
}