The Sogou System for Blizzard Challenge 2020

Fanbo Meng, Ruimin Wang, Peng Fang, Shuangyuan Zou, Wenjun Duan, Ming Zhou, Kai Liu, Wei Chen


In this paper, we introduce the text-to-speech system from Sogou team submitted to Blizzard Challenge 2020. The goal of this year’s challenge is to build a natural Mandarin Chinese speech synthesis system from the 10-hours corpus by a native Chinese male speaker. We will discuss the major modules of the submitted system: (1) the front-end module to analyze the pronunciation and prosody of text; (2) the FastSpeech-based sequence-to-sequence acoustic model to predict acoustic features; (3) the WaveRNN based neural vocoder to reconstruct waveforms. Evaluation results provided by the challenge organizer are also discussed.


 DOI: 10.21437/VCC_BC.2020-8

Cite as: Meng, F., Wang, R., Fang, P., Zou, S., Duan, W., Zhou, M., Liu, K., Chen, W. (2020) The Sogou System for Blizzard Challenge 2020. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 49-53, DOI: 10.21437/VCC_BC.2020-8.


@inproceedings{Meng2020,
  author={Fanbo Meng and Ruimin Wang and Peng Fang and Shuangyuan Zou and Wenjun Duan and Ming Zhou and Kai Liu and Wei Chen},
  title={{The Sogou System for Blizzard Challenge 2020}},
  year=2020,
  booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020},
  pages={49--53},
  doi={10.21437/VCC_BC.2020-8},
  url={http://dx.doi.org/10.21437/VCC_BC.2020-8}
}