14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Voice Conversion for Non-Parallel Datasets Using Dynamic Kernel Partial Least Squares Regression

Hanna Silén, Jani Nurminen, Elina Helander, Moncef Gabbouj

Tampere University of Technology, Finland

Voice conversion aims at converting speech from one speaker to sound as if it was spoken by another specific speaker. The most popular voice conversion approach based on Gaussian mixture modeling tends to suffer either from model overfitting or oversmoothing. To overcome the shortcomings of the traditional approach, we recently proposed to use dynamic kernel partial least squares (DKPLS) regression in the framework of parallel-data voice conversion. However, the availability of parallel training data from both the source and target speaker is not always guaranteed. In this paper, we extend the DKPLS-based conversion approach for non-parallel data by combining it with a well-known INCA alignment algorithm. The listening test results indicate that high-quality conversion can be achieved with the proposed combination. Furthermore, the performance of two variations of INCA are evaluated with both intra-lingual and cross-lingual data.

Full Paper

Bibliographic reference.  Silén, Hanna / Nurminen, Jani / Helander, Elina / Gabbouj, Moncef (2013): "Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression", In INTERSPEECH-2013, 373-377.