NUS-HLT System for Blizzard Challenge 2020

Yi Zhou, Xiaohai Tian, Xuehao Zhou, Mingyang Zhang, Grandee Lee, Riu Liu, Berrak Sisman, Haizhou Li


The paper presents the NUS-HLT text-to-speech (TTS) system for the Blizzard Challenge 2020. The challenge has two tasks: Hub task 2020-MH1 to synthesize Mandarin Chinese given 9.5 hours of speech data from a male native speaker of Mandarin; Spoke task 2020-SS1 to synthesize Shanghainese given 3 hours of speech data from a female native speaker of Shanghainese. Our submitted system combines the word embedding, which is extracted from a pre-trained language model, with the E2E TTS synthesizer to generate acoustic features from text input. WaveRNN neural vocoder and WaveNet neural vocoder are utilized to generate speech waveforms from acoustic features in MH1 and SS1 tasks, respectively. Evaluation results provided by the challenge organizers demonstrate the effectiveness of our submitted TTS system.


 DOI: 10.21437/VCC_BC.2020-7

Cite as: Zhou, Y., Tian, X., Zhou, X., Zhang, M., Lee, G., Liu, R., Sisman, B., Li, H. (2020) NUS-HLT System for Blizzard Challenge 2020. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 44-48, DOI: 10.21437/VCC_BC.2020-7.


@inproceedings{Zhou2020,
  author={Yi Zhou and Xiaohai Tian and Xuehao Zhou and Mingyang Zhang and Grandee Lee and Riu Liu and Berrak Sisman and Haizhou Li},
  title={{NUS-HLT System for Blizzard Challenge 2020}},
  year=2020,
  booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020},
  pages={44--48},
  doi={10.21437/VCC_BC.2020-7},
  url={http://dx.doi.org/10.21437/VCC_BC.2020-7}
}