ISCA Archive SSW 2023
ISCA Archive SSW 2023

Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion

Ryunosuke Hirai, Yuki Saito, Hiroshi Saruwatari

We propose a method for training a many-to-many voice conversion (VC) model that can additionally learn users’ voiceswhile protecting the privacy of their data. Conventional many-to-many VC methods train a VC model using a publicly available or proprietary multi-speaker corpus. However, they do notalways achieve high-quality VC for input speech from varioususers. Our method is based on federated learning, a frameworkof distributed machine learning where a developer and userscooperatively train a machine learning model while protectingthe privacy of user-owned data. We present a proof-of-conceptmethod on the basis of StarGANv2-VC (i.e., Fed-StarGANv2-VC) and demonstrate that our method can achieve speaker similarity comparable to conventional non-federated StarGANv2-VC.


doi: 10.21437/SSW.2023-15

Cite as: Hirai, R., Saito, Y., Saruwatari, H. (2023) Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion. Proc. 12th ISCA Speech Synthesis Workshop (SSW2023), 94-99, doi: 10.21437/SSW.2023-15

@inproceedings{hirai23_ssw,
  author={Ryunosuke Hirai and Yuki Saito and Hiroshi Saruwatari},
  title={{Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion}},
  year=2023,
  booktitle={Proc. 12th ISCA Speech Synthesis Workshop (SSW2023)},
  pages={94--99},
  doi={10.21437/SSW.2023-15}
}