A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation

Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher

Domain adaptation plays an important role for speech recognition models, in particular, for domains that have low resources. We propose a novel generative model based on cyclic-consistent generative adversarial network (CycleGAN) for unsupervised non-parallel speech domain adaptation. The proposed model employs multiple independent discriminators on the power spectrogram, each in charge of different frequency bands. As a result we have 1) better discriminators that focus on fine-grained details of the frequency features and 2) a generator that is capable of generating more realistic domain-adapted spectrogram. We demonstrate the effectiveness of our method on speech recognition with gender adaptation, where the model only has access to supervised data from one gender during training, but is evaluated on the other at test time. Our model is able to achieve an average of 7.41% on phoneme error rate and 11.10% word error rate relative performance improvement as compared to the baseline, on TIMIT and WSJ dataset, respectively. Qualitatively, our model also generates more natural sounding speech, when conditioned on data from the other domain.

 DOI: 10.21437/Interspeech.2018-1535

Cite as: Hosseini-Asl, E., Zhou, Y., Xiong, C., Socher, R. (2018) A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation. Proc. Interspeech 2018, 3758-3762, DOI: 10.21437/Interspeech.2018-1535.

  author={Ehsan Hosseini-Asl and Yingbo Zhou and Caiming Xiong and Richard Socher},
  title={A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation},
  booktitle={Proc. Interspeech 2018},