ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

CVC: Contrastive Learning for Non-Parallel Voice Conversion

Tingle Li, Yichen Liu, Chenxu Hu, Hang Zhao

Cycle consistent generative adversarial network (CycleGAN) and variational autoencoder (VAE) based models have gained popularity in non-parallel voice conversion recently. However, they often suffer from difficult training process and unsatisfactory results. In this paper, we propose a contrastive learning-based adversarial approach for voice conversion, namely contrastive voice conversion (CVC). Compared to previous CycleGAN-based methods, CVC only requires an efficient one-way GAN training by taking the advantage of contrastive learning. When it comes to non-parallel one-to-one voice conversion, CVC is on par or better than CycleGAN and VAE while effectively reducing training time. CVC further demonstrates superior performance in many-to-one voice conversion, enabling the conversion from unseen speakers.


doi: 10.21437/Interspeech.2021-137

Cite as: Li, T., Liu, Y., Hu, C., Zhao, H. (2021) CVC: Contrastive Learning for Non-Parallel Voice Conversion. Proc. Interspeech 2021, 1324-1328, doi: 10.21437/Interspeech.2021-137

@inproceedings{li21d_interspeech,
  author={Tingle Li and Yichen Liu and Chenxu Hu and Hang Zhao},
  title={{CVC: Contrastive Learning for Non-Parallel Voice Conversion}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1324--1328},
  doi={10.21437/Interspeech.2021-137}
}