The Ajmide Text-To-Speech System for Blizzard Challenge 2020

Beibei Hu, Zilong Bai, Qiang Li


The paper describes the NLPR speech synthesis system entry for Blizzard Challenge 2020. More than 9 hours of speech data from an news anchor and 3 hours of speech from one native Shanghainese speaker are adopted as training data for building system this year. Our speech synthesis system is built based on the multi-speaker end-to-end speech synthesis system. LPCNet based neural vocoder is adapted to improve the quality. Different from our previous system, some improvements about data pruning and speaker adaptation strategies were made to improve the stability of our system. In this paper, the whole system structure, data pruning method, and the duration control will be introduced and discussed. In addition, this competition includes two tasks of Mandarin and Shanghainese, and we will introduce the important parts of each topic respectively. Finally, the results of listening test are presented.


 DOI: 10.21437/VCC_BC.2020-13

Cite as: Hu, B., Bai, Z., Li, Q. (2020) The Ajmide Text-To-Speech System for Blizzard Challenge 2020. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 75-79, DOI: 10.21437/VCC_BC.2020-13.


@inproceedings{Hu2020,
  author={Beibei Hu and Zilong Bai and Qiang Li},
  title={{The Ajmide Text-To-Speech System for Blizzard Challenge 2020}},
  year=2020,
  booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020},
  pages={75--79},
  doi={10.21437/VCC_BC.2020-13},
  url={http://dx.doi.org/10.21437/VCC_BC.2020-13}
}