We propose a method for training a many-to-many voice conversion (VC) model that can additionally learn users’ voiceswhile protecting the privacy of their data. Conventional many-to-many VC methods train a VC model using a publicly available or proprietary multi-speaker corpus. However, they do notalways achieve high-quality VC for input speech from varioususers. Our method is based on federated learning, a frameworkof distributed machine learning where a developer and userscooperatively train a machine learning model while protectingthe privacy of user-owned data. We present a proof-of-conceptmethod on the basis of StarGANv2-VC (i.e., Fed-StarGANv2-VC) and demonstrate that our method can achieve speaker similarity comparable to conventional non-federated StarGANv2-VC.
Cite as: Hirai, R., Saito, Y., Saruwatari, H. (2023) Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion. Proc. 12th ISCA Speech Synthesis Workshop (SSW2023), 94-99, doi: 10.21437/SSW.2023-15
@inproceedings{hirai23_ssw, author={Ryunosuke Hirai and Yuki Saito and Hiroshi Saruwatari}, title={{Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion}}, year=2023, booktitle={Proc. 12th ISCA Speech Synthesis Workshop (SSW2023)}, pages={94--99}, doi={10.21437/SSW.2023-15} }