ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

Exploring the limits of neural voice cloning: A case study on two well-known personalities

Ander González-Docasal, Aitor Álvarez, Haritz Arzelus

This work describes one successful and one failed Voice Cloning processes of two famous personalities in order to be broadcast in a high-impact podcast and in a Spanish public television program. Whilst a good quality synthesised voice could be generated for the first public figure, the second one was not adequate enough for its broadcast on television given its low speech quality. In this study, we explore the limits of the neural voice cloning considering the different conditions of the training material employed in each case and, based on several objective measures (amount of training data, phoneme coverage, SNR, MCD and PESQ), we analysed the main features to be considered for a high-quality synthetic voice generation. In addition, a webpage is provided in which samples of the resulting audios are available for each cloning model.


doi: 10.21437/IberSPEECH.2022-3

Cite as: González-Docasal, A., Álvarez, A., Arzelus, H. (2022) Exploring the limits of neural voice cloning: A case study on two well-known personalities . Proc. IberSPEECH 2022, 11-15, doi: 10.21437/IberSPEECH.2022-3

@inproceedings{gonzalezdocasal22_iberspeech,
  author={Ander González-Docasal and Aitor Álvarez and Haritz Arzelus},
  title={{Exploring the limits of neural voice cloning: A case study on two well-known personalities }},
  year=2022,
  booktitle={Proc. IberSPEECH 2022},
  pages={11--15},
  doi={10.21437/IberSPEECH.2022-3}
}