In this paper, we present a description of the baseline system of Voice Conversion Challenge (VCC) 2020 with a cyclic variational autoencoder (CycleVAE) and ParallelWaveGAN (PWG), i.e., CycleVAEPWG. CycleVAE is a nonparallel VAE-based voice conversion that utilizes converted acoustic features to consider cyclically reconstructed spectra during optimization. On the other hand, PWG is a non-autoregressive neural vocoder that is based on a generative adversarial network for a high-quality and fast waveform generator. In practice, the CycleVAEPWG system can be straightforwardly developed with the VCC 2020 dataset using a unified model for both Task 1 (intralingual) and Task 2 (cross-lingual), where our open-source implementation is available at https://github.com/bigpon/vcc20\_baseline\_cyclevae. The results of VCC 2020 have demonstrated that the CycleVAEPWG baseline achieves the following: 1) a mean opinion score (MOS) of 2.87 in naturalness and a speaker similarity percentage (Sim) of 75.37\% for Task 1, and 2) a MOS of 2.56 and a Sim of 56.46\% for Task 2, showing an approximately or nearly average score for naturalness and an above average score for speaker similarity.
Cite as: Tobing, P.L., Wu, Y.-C., Toda, T. (2020) Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 155-159, doi: 10.21437/VCCBC.2020-23
@inproceedings{tobing20_vccbc, author={Patrick Lumban Tobing and Yi-Chiao Wu and Tomoki Toda}, title={{Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN}}, year=2020, booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020}, pages={155--159}, doi={10.21437/VCCBC.2020-23} }