Average Modeling Approach to Voice Conversion with Non-Parallel Data

Xiaohai Tian, Junchao Wang, Haihua Xu, Eng-Siong Chng, Haizhou Li


Voice conversion techniques typically require source-target parallel speech data for model training. Such parallel data may not be available always in practice. This paper presents a non-parallel data approach, that we call average modeling approach. The proposed approach makes use of a multi-speaker average model that maps speaker-independent linguistic features to speaker dependent acoustic features. In particular, we present two practical implementations, 1) to adapt the average model towards target speaker with a small amount of target data, 2) to present speaker identity as an additional input to the average model to generate target speech. As the linguistic feature and the acoustic feature can be extracted from the same utterance, the proposed approach doesn't require parallel data in either average model training or adaptation. We report the experiments on the voice conversion challenge 2018 (VCC2018) database that validate the effectiveness of the proposed method.


 DOI: 10.21437/Odyssey.2018-32

Cite as: Tian, X., Wang, J., Xu, H., Chng, E., Li, H. (2018) Average Modeling Approach to Voice Conversion with Non-Parallel Data . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 227-232, DOI: 10.21437/Odyssey.2018-32.


@inproceedings{Tian2018,
  author={Xiaohai Tian and Junchao Wang and Haihua Xu and Eng-Siong Chng and Haizhou Li},
  title={Average Modeling Approach to Voice Conversion with Non-Parallel Data	},
  year=2018,
  booktitle={Proc. Odyssey 2018 The Speaker and Language Recognition Workshop},
  pages={227--232},
  doi={10.21437/Odyssey.2018-32},
  url={http://dx.doi.org/10.21437/Odyssey.2018-32}
}