This paper presents the NUS & NWPU voice conversion system for Voice Conversion Challenge 2020. Our submission is a Phonetic PosteriorGram (PPG) based voice conversion system, which consists of three modules, including PPG extractor, feature conversion and converted speech signal generation modules. Firstly, a PPG extractor is adopted to extract the speaker independent content features from a speech signal. Then, anencoder-decoder based feature conversion model is used to predict the converted features with the PPG inputs. Finally, a multiband WaveRNN is utilized to generate the time-domain speech signal from the converted features. The same implementation is used for both intra-lingual and cross-lingual voice conversion tasks. Evaluation results demonstrated the effectiveness of our proposed system.
Cite as: Tian, X., Wang, Z., Yang, S., Zhou, X., Du, H., Zhou, Y., Zhang, M., Zhou, K., Sisman, B., Xie, L., Li, H. (2020) The NUS & NWPU system for Voice Conversion Challenge 2020. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 170-174, doi: 10.21437/VCCBC.2020-26
@inproceedings{tian20b_vccbc, author={Xiaohai Tian and Zhichao Wang and Shan Yang and Xinyong Zhou and Hongqiang Du and Yi Zhou and Mingyang Zhang and Kun Zhou and Berrak Sisman and Lei Xie and Haizhou Li}, title={{The NUS & NWPU system for Voice Conversion Challenge 2020}}, year=2020, booktitle={Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020}, pages={170--174}, doi={10.21437/VCCBC.2020-26} }