Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus

Tetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu


This paper proposes a novel approach to parallel-data-free and many-to-many voice conversion (VC). As 1-to-1 conversion has less flexibility, researchers focus on many-to-many conversion, where speaker identity is often represented using speaker space bases. In this case, utterances of the same sentences have to be collected from many speakers. This study aims at overcoming this constraint to realize a parallel-data-free and many-to-many conversion. This is made possible by integrating deep neural networks (DNNs) with eigenspace using a non-parallel speech corpus. In our previous study, many-to-many conversion was implemented using DNN, whose training was assisted by EVGMM conversion. By realizing the function of EVGMM equivalently by constructing eigenspace with a non-parallel speech corpus, the desired conversion is made possible. A key technique here is to estimate covariance terms without given parallel data between source and target speakers. Experiments show that objective assessment scores are comparable to those of the baseline system trained with parallel data.


 DOI: 10.21437/Interspeech.2017-961

Cite as: Hashimoto, T., Uchida, H., Saito, D., Minematsu, N. (2017) Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus. Proc. Interspeech 2017, 1278-1282, DOI: 10.21437/Interspeech.2017-961.


@inproceedings{Hashimoto2017,
  author={Tetsuya Hashimoto and Hidetsugu Uchida and Daisuke Saito and Nobuaki Minematsu},
  title={Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1278--1282},
  doi={10.21437/Interspeech.2017-961},
  url={http://dx.doi.org/10.21437/Interspeech.2017-961}
}