ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression

Hanna Silén, Jani Nurminen, Elina Helander, Moncef Gabbouj

Voice conversion aims at converting speech from one speaker to sound as if it was spoken by another specific speaker. The most popular voice conversion approach based on Gaussian mixture modeling tends to suffer either from model overfitting or oversmoothing. To overcome the shortcomings of the traditional approach, we recently proposed to use dynamic kernel partial least squares (DKPLS) regression in the framework of parallel-data voice conversion. However, the availability of parallel training data from both the source and target speaker is not always guaranteed. In this paper, we extend the DKPLS-based conversion approach for non-parallel data by combining it with a well-known INCA alignment algorithm. The listening test results indicate that high-quality conversion can be achieved with the proposed combination. Furthermore, the performance of two variations of INCA are evaluated with both intra-lingual and cross-lingual data.


doi: 10.21437/Interspeech.2013-103

Cite as: Silén, H., Nurminen, J., Helander, E., Gabbouj, M. (2013) Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression. Proc. Interspeech 2013, 373-377, doi: 10.21437/Interspeech.2013-103

@inproceedings{silen13_interspeech,
  author={Hanna Silén and Jani Nurminen and Elina Helander and Moncef Gabbouj},
  title={{Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={373--377},
  doi={10.21437/Interspeech.2013-103}
}