The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion

Ling-Hui Chen, Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Li-Rong Dai


This paper introduces the methods we adopt to build our system for the evaluation event of Voice Conversion Challenge (VCC) 2016. We propose to use neural network-based approaches to convert both spectral and excitation features. First, the generatively trained deep neural network (GTDNN) is adopted for spectral envelope conversion after the spectral envelopes have been pre-processed by frequency warping. Second, we propose to use a recurrent neural network (RNN) with long short-term memory (LSTM) cells for F0 trajectory conversion. In addition, we adopt a DNN for band aperiodicity conversion. Both internal tests and formal VCC evaluation results demonstrate the effectiveness of the proposed methods.


DOI: 10.21437/Interspeech.2016-456

Cite as

Chen, L., Liu, L., Ling, Z., Jiang, Y., Dai, L. (2016) The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion. Proc. Interspeech 2016, 1642-1646.

Bibtex
@inproceedings{Chen+2016,
author={Ling-Hui Chen and Li-Juan Liu and Zhen-Hua Ling and Yuan Jiang and Li-Rong Dai},
title={The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-456},
url={http://dx.doi.org/10.21437/Interspeech.2016-456},
pages={1642--1646}
}