This paper presents the intra-lingual and cross-lingual voice conversion system for Voice Conversion Challenge 2020(VCC 2020). Voice conversion (VC) modifies a source speaker’s speech so that the result sounds like a target speaker. This becomes particularly difficult when source and target speakers speak different languages. In this work we focus on building a voice conversion system achieving consistent improvements in accent and intelligibility evaluations. Our voice conversion system is constituted by a bilingual phoneme recognition based speech representation module, a neural network based speech generation module and a neural vocoder. More concretely, we extract general phonation from the source speakers' speeches of different languages, and improve the sound quality by optimizing the speech synthesis module and adding a noise suppression post-process module to the vocoder. This framework ensures high intelligible and high natural speech, which is very close to human quality (MOS=4.17 rank 2 in Task 1, MOS=4.13 rank 2 in Task 2).
Cite as: Ma, Q., Liu, R., Wen, X., Lu, C., Chen, X. (2020) Submission from SRCB for Voice Conversion Challenge 2020. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 131-135, doi: 10.21437/VCCBC.2020-18
@inproceedings{ma20_vccbc, author={Qiuyue Ma and Ruolan Liu and Xue Wen and Chunhui Lu and Xiao Chen}, title={{Submission from SRCB for Voice Conversion Challenge 2020}}, year=2020, booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020}, pages={131--135}, doi={10.21437/VCCBC.2020-18} }