ISCA Archive SSW 2021
ISCA Archive SSW 2021

Combining speakers of multiple languages to improve quality of neural voices

Javier Latorre, Charlotte Bailleul, Tuuli Morrill, Alistair Conkie, Yannis Stylianou

In this work, we explore multiple architectures and training procedures for developing a multi-speaker and multi-lingual neural TTS system with the goals of a) improving the quality when the available data in the target language is limited and b) enabling cross-lingual synthesis. We report results from a large experiment using 30 speakers in 8 different languages across 15 different locales. The system is trained on the same amount of data per speaker. Compared to a single-speaker model, when the suggested system is fine tuned to a speaker, it produces significantly better quality in most of the cases while it only uses less than 40% of the speaker’s data used to build the singlespeaker model. In cross-lingual synthesis, on average, the generated quality is within 80% of native single-speaker models, in terms of Mean Opinion Score.


doi: 10.21437/SSW.2021-7

Cite as: Latorre, J., Bailleul, C., Morrill, T., Conkie, A., Stylianou, Y. (2021) Combining speakers of multiple languages to improve quality of neural voices. Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 37-42, doi: 10.21437/SSW.2021-7

@inproceedings{latorre21_ssw,
  author={Javier Latorre and Charlotte Bailleul and Tuuli Morrill and Alistair Conkie and Yannis Stylianou},
  title={{Combining speakers of multiple languages to improve quality of neural voices}},
  year=2021,
  booktitle={Proc. 11th ISCA Speech Synthesis Workshop (SSW 11)},
  pages={37--42},
  doi={10.21437/SSW.2021-7}
}