This paper describes an evaluation of many-to-one voice conversion (VC) algorithms converting an arbitrary speakers voice into a particular target speakers voice. These algorithms effectively generate a conversion model for a new source speaker using multiple parallel data sets of many pre-stored source speakers and the single target speaker. We conducted experimental evaluations for demonstrating the conversion performance of each of the many-to-one VC algorithms, including not only the conventional algorithms based on a speaker independent GMM and on eigenvoice conversion (EVC), but also new algorithms based on speaker selection and on EVC with speaker adaptive training (SAT). As a result, it is shown that an adaptation process of the conversion model improves significantly conversion performance, and the algorithm based on speaker selection works well even when using a very limited amount of adaptation data.
Cite as: Tani, D., Ohtani, Y., Toda, T., Saruwatari, H., Shikano, K. (2007) An evaluation of many-to-one voice conversion algorithms with pre-stored speaker data sets. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 107-112
@inproceedings{tani07_ssw, author={Daisuke Tani and Yamato Ohtani and Tomoki Toda and Hiroshi Saruwatari and Kiyohiro Shikano}, title={{An evaluation of many-to-one voice conversion algorithms with pre-stored speaker data sets}}, year=2007, booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)}, pages={107--112} }