ISCA Archive SPSC 2021
ISCA Archive SPSC 2021

Contributions to neural speech synthesis using limited data enhanced with lexical features

Beáta Lőrincz

Building single or multi-speaker neural network-based text-tospeech synthesis systems commonly relies on the availability of large amounts of high quality recordings from each speaker and conditioning the training process on the speaker’s identity or on a learned representation of it. However, when little data is available from each speaker, or the number of speakers is limited, the speech synthesis system can be hard to train and will result in poor speaker similarity and naturalness. In order to address this issue we explore several directions by engaging speaker adaptation, additional loss terms, data augmentation, speaker selection methods and different types of textual representations to improve the quality of the synthetic output speech. Our experiments are focused on the Romanian language that is considered a low-resource language. Objective and subjective measures are used to evaluate the effectiveness of the proposed methods.


Cite as: Lőrincz, B. (2021) Contributions to neural speech synthesis using limited data enhanced with lexical features. Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication, 83-85

@inproceedings{lorincz21_spsc,
  author={Beáta Lőrincz},
  title={{Contributions to neural speech synthesis using limited data enhanced with lexical features}},
  year=2021,
  booktitle={Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication},
  pages={83--85}
}