The NLPR Speech Synthesis entry for Blizzard Challenge 2020

Tao Wang, Jianhua Tao, Ruibo Fu, Zhengqi Wen, Chunyu Qiang


The paper describes the NLPR speech synthesis system entry for Blizzard Challenge 2020. More than 9 hours of speech data from an news anchor and 3 hours of speech from one native Shanghainese speaker are adopted as training data for building system this year. Our speech synthesis system is built based on the multi-speaker end-to-end speech synthesis system. LPCNet based neural vocoder is adapted to improve the quality. Different from our previous system, some improvements about data pruning and speaker adaptation strategies were made to improve the stability of our system. In this paper, the whole system structure, data pruning method, and the duration control will be introduced and discussed. In addition, this competition includes two tasks of Mandarin and Shanghainese, and we will introduce the important parts of each topic respectively. Finally, the results of listening test are presented.


 DOI: 10.21437/VCC_BC.2020-12

Cite as: Wang, T., Tao, J., Fu, R., Wen, Z., Qiang, C. (2020) The NLPR Speech Synthesis entry for Blizzard Challenge 2020. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 70-74, DOI: 10.21437/VCC_BC.2020-12.


@inproceedings{Wang2020,
  author={Tao Wang and Jianhua Tao and Ruibo Fu and Zhengqi Wen and Chunyu Qiang},
  title={{The NLPR Speech Synthesis entry for Blizzard Challenge 2020}},
  year=2020,
  booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020},
  pages={70--74},
  doi={10.21437/VCC_BC.2020-12},
  url={http://dx.doi.org/10.21437/VCC_BC.2020-12}
}