Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks

Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma


This paper proposes a fast learning framework for non-parallel many-to-many voice conversion with residual Star Generative Adversarial Networks (StarGAN). In addition to the state-of-the-art StarGAN-VC approach that learns an unreferenced mapping between a group of speakers’ acoustic features for nonparallel many-to-many voice conversion, our method, which we call Res-StarGAN-VC, presents an enhancement by incorporating a residual mapping. The idea is to leverage on the shared linguistic content between source and target features during conversion. The residual mapping is realized by using identity shortcut connections from the input to the output of the generator in Res-StarGAN-VC. Such shortcut connections accelerate the learning process of the network with no increase of parameters and computational complexity. They also help generate high-quality fake samples at the very beginning of the adversarial training. Experiments and subjective evaluations show that the proposed method offers (1) significantly faster convergence in adversarial training and (2) clearer pronunciations and better speaker similarity of converted speech, compared to the StarGAN-VC baseline on both mono-lingual and cross-lingual many-to-many voice conversion tasks.


 DOI: 10.21437/Interspeech.2019-2067

Cite as: Zhao, S., Nguyen, T.H., Wang, H., Ma, B. (2019) Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks. Proc. Interspeech 2019, 689-693, DOI: 10.21437/Interspeech.2019-2067.


@inproceedings{Zhao2019,
  author={Shengkui Zhao and Trung Hieu Nguyen and Hao Wang and Bin Ma},
  title={{Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={689--693},
  doi={10.21437/Interspeech.2019-2067},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2067}
}