The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018

Yichiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda


This paper presents the NU non-parallel voice conversion (VC) system developed at Nagoya University for SPOKE task of Voice Conversion Challenge 2018 (VCC2018). The goal of the SPOKE task is to develop VC systems without the requirement of parallel training data. The key idea of our system development is to use text-to-speech (TTS) voice as a reference voice, making it possible to create two parallel training datasets between the source and TTS voices and between the TTS and target voices. Using these datasets, a cascade VC system is developed to convert the source voice into the target voice via the TTS voice as the reference. Furthermore, we also propose a system selection framework to avoid generating collapsed speech waveforms, which are often observed by using less accurately converted speech features in WaveNet vocoder. The VCC2018 results demonstrate that our system has achieved the 2nd best in terms of similarity (around 70% of the similarity score) and an above average in terms of naturalness (around 3.0 of the mean opinion score) among all submitted systems.


 DOI: 10.21437/Odyssey.2018-30

Cite as: Wu, Y., Tobing, P.L., Hayashi, T., Kobayashi, K., Toda, T. (2018) The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018. Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 211-218, DOI: 10.21437/Odyssey.2018-30.


@inproceedings{Wu2018,
  author={Yichiao Wu and Patrick Lumban Tobing and Tomoki Hayashi and Kazuhiro Kobayashi and Tomoki Toda},
  title={The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018},
  year=2018,
  booktitle={Proc. Odyssey 2018 The Speaker and Language Recognition Workshop},
  pages={211--218},
  doi={10.21437/Odyssey.2018-30},
  url={http://dx.doi.org/10.21437/Odyssey.2018-30}
}