ISCA Archive SSW 2021
ISCA Archive SSW 2021

Improving Polyglot Speech Synthesis through Multi-task and Adversarial Learning

Jason Fong, Jilong Wu, Prabhav Agrawal, Andrew Gibiansky, Thilo Koehler, Qing He

It is still quite challenging for polyglot speech synthesis systems to synthesise speech with the same pronunciations and accent as a native speaker, especially when there are fewer speakers per language. In this work, we target an extreme version of the polyglot synthesis problem, where we have only one speaker per language, and the system has to learn to disentangle speaker from language features from just one speakerlanguage pair. To tackle this problem, we propose a novel approach based on a combination of multi-task learning and adversarial learning to help the model produce more realistic acoustic features for speaker-language combinations for which we have no data. Our proposed system improves the overall naturalness of synthesised speech achieving upto 4.2% higher naturalness over a multispeaker baseline. Our qualitative listening tests also demonstrate that system produces speech which sounds less accented and more natural to a native speaker.


doi: 10.21437/SSW.2021-30

Cite as: Fong, J., Wu, J., Agrawal, P., Gibiansky, A., Koehler, T., He, Q. (2021) Improving Polyglot Speech Synthesis through Multi-task and Adversarial Learning. Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 172-176, doi: 10.21437/SSW.2021-30

@inproceedings{fong21_ssw,
  author={Jason Fong and Jilong Wu and Prabhav Agrawal and Andrew Gibiansky and Thilo Koehler and Qing He},
  title={{Improving Polyglot Speech Synthesis through Multi-task and Adversarial Learning}},
  year=2021,
  booktitle={Proc. 11th ISCA Speech Synthesis Workshop (SSW 11)},
  pages={172--176},
  doi={10.21437/SSW.2021-30}
}