Submission from SRCB for Voice Conversion Challenge 2020

Qiuyue Ma, Ruolan Liu, Xue Wen, Chunhui Lu, Xiao Chen


This paper presents the intra-lingual and cross-lingual voice conversion system for Voice Conversion Challenge 2020(VCC 2020). Voice conversion (VC) modifies a source speaker’s speech so that the result sounds like a target speaker. This becomes particularly difficult when source and target speakers speak different languages. In this work we focus on building a voice conversion system achieving consistent improvements in accent and intelligibility evaluations. Our voice conversion system is constituted by a bilingual phoneme recognition based speech representation module, a neural network based speech generation module and a neural vocoder. More concretely, we extract general phonation from the source speakers' speeches of different languages, and improve the sound quality by optimizing the speech synthesis module and adding a noise suppression post-process module to the vocoder. This framework ensures high intelligible and high natural speech, which is very close to human quality (MOS=4.17 rank 2 in Task 1, MOS=4.13 rank 2 in Task 2).


 DOI: 10.21437/VCC_BC.2020-18

Cite as: Ma, Q., Liu, R., Wen, X., Lu, C., Chen, X. (2020) Submission from SRCB for Voice Conversion Challenge 2020. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 131-135, DOI: 10.21437/VCC_BC.2020-18.


@inproceedings{Ma2020,
  author={Qiuyue Ma and Ruolan Liu and Xue Wen and Chunhui Lu and Xiao Chen},
  title={{Submission from SRCB for Voice Conversion Challenge 2020}},
  year=2020,
  booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020},
  pages={131--135},
  doi={10.21437/VCC_BC.2020-18},
  url={http://dx.doi.org/10.21437/VCC_BC.2020-18}
}