This paper presents the OPPO text-to-speech system for Blizzard Challenge 2020. A statistical parametric speech synthesis based system was built with improvements in both frontend and backend. For the Mandarin task, a BERT model was used for the frontend, a Tacotron acoustic model and a WaveRNN vocoder model were used for the backend. For the Shanghainese task, the frontend was built from scratch, a Tacotron acoustic model and a MelGAN vocoder model were used for the backend. For the Mandarin task, evaluation results showed that our proposed system performed best in naturalness, and achieved near-best results in similarity. For the Shanghainese task, we got poor results in most indicators.
Cite as: Song, Y., Liang, M., Yang, G., Xie, K., Hao, J. (2020) The OPPO System for the Blizzard Challenge 2020. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 24-27, doi: 10.21437/VCCBC.2020-3
@inproceedings{song20_vccbc, author={Yang Song and Min Liang and Guilin Yang and Kun Xie and Jie Hao}, title={{The OPPO System for the Blizzard Challenge 2020}}, year=2020, booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020}, pages={24--27}, doi={10.21437/VCCBC.2020-3} }