The Academia Sinica Systems of Voice Conversion for VCC2020

YuHuai Peng, Cheng-Hung Hu, Alexander Kang, Hung-Shin Lee, Pin-Yuan Chen, Yu Tsao, Hsin-Min Wang


This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2). For both tasks, we followed the cascaded ASR+TTS structure, using phonetic tokens as the TTS input instead of the text or characters. For Task 1, we used the international phonetic alphabet (IPA) as the input of the TTS model. For Task 2, we used unsupervised phonetic symbols extracted by the vector-quantized variational autoencoder (VQVAE). In the evaluation, the listening test showed that our systems performed well in the VCC2020 challenge.


 DOI: 10.21437/VCC_BC.2020-28

Cite as: Peng, Y., Hu, C., Kang, A., Lee, H., Chen, P., Tsao, Y., Wang, H. (2020) The Academia Sinica Systems of Voice Conversion for VCC2020. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 180-183, DOI: 10.21437/VCC_BC.2020-28.


@inproceedings{Peng2020,
  author={YuHuai Peng and Cheng-Hung Hu and Alexander Kang and Hung-Shin Lee and Pin-Yuan Chen and Yu Tsao and Hsin-Min Wang},
  title={{The Academia Sinica Systems of Voice Conversion for VCC2020}},
  year=2020,
  booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020},
  pages={180--183},
  doi={10.21437/VCC_BC.2020-28},
  url={http://dx.doi.org/10.21437/VCC_BC.2020-28}
}