The SHNU System for Blizzard Challenge 2020

Laipeng He, Qiang Shi, Lang Wu, Jianqing Sun, Renke He, Yanhua Long, Jiaen Liang


This paper introduces the SHNU (team I) speech synthesis system for Blizzard Challenge 2020. Speech data released this year includes two parts: a 9.5-hour Mandarin corpus from a male native speaker and a 3-hour Shanghainese corpus from a female native speaker. Based on these corpora, we built two neural network-based speech synthesis systems to synthesize speech for both tasks. The same system architecture was used for both the Mandarin and Shanghainese tasks. Specifically, our systems include a front-end module, a Tacotron-based spectrogram pre-diction network and a WaveNet-based neural vocoder. Firstly, a pre-built front-end module was used to generate character sequence and linguistic features from the training text. Then, we applied a Tacotron-based sequence-to-sequence model to generate mel-spectrogram from character sequence. Finally, a WaveNet-based neural vocoder was adopted to reconstruct audio waveform with the mel-spectrogram from Tacotron. Evaluation results demonstrated that our system achieved an extremely good performance on both tasks, which proved the effectiveness of our proposed system.


 DOI: 10.21437/VCC_BC.2020-2

Cite as: He, L., Shi, Q., Wu, L., Sun, J., He, R., Long, Y., Liang, J. (2020) The SHNU System for Blizzard Challenge 2020. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 19-23, DOI: 10.21437/VCC_BC.2020-2.


@inproceedings{He2020,
  author={Laipeng He and Qiang Shi and Lang Wu and Jianqing Sun and Renke He and Yanhua Long and Jiaen Liang},
  title={{The SHNU System for Blizzard Challenge 2020}},
  year=2020,
  booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020},
  pages={19--23},
  doi={10.21437/VCC_BC.2020-2},
  url={http://dx.doi.org/10.21437/VCC_BC.2020-2}
}