ISCA Archive S4SG 2022
ISCA Archive S4SG 2022

Building TTS systems for low resource languages under resource constraints

Perez Ogayo, Graham Neubig, Alan W Black

The field of speech synthesis has advanced to remarkable levels of producing natural-sounding speech given sufficient high-quality data. As a result, speech synthesis applications are increasingly becoming ubiquitous for high resource languages. However, support for low resource languages is limited by the lack of data. This project aims to democratize text-to-speech systems and datasets for African languages. Through a participatory approach, we curate data from existing "found" sources and record datasets using more affordable equipment. We build Flite-based voices that can be easily deployed to mobile phones and require less expensive compute to train so that the work can be accessible. We release the speech data, code, and trained voices for 16 African languages to help researchers and developers. In addition, through our website users can interact with the synthesizers and provide feedback for iterative improvement of the synthesizers. Finally, we show that we can develop synthesizers that generate intelligible speech with 25 minutes of created speech, even when recorded in suboptimal environments. This paper appears in Interspeech 2022 as "Building African Voices", doi: 10.21437/Interspeech.2022-152


Cite as: Ogayo, P., Neubig, G., Black, A.W. (2022) Building TTS systems for low resource languages under resource constraints. Proc. 1st Workshop on Speech for Social Good (S4SG)

@inproceedings{ogayo22_s4sg,
  author={Perez Ogayo and Graham Neubig and Alan W Black},
  title={{Building TTS systems for low resource languages under resource constraints}},
  year=2022,
  booktitle={Proc. 1st Workshop on Speech for Social Good (S4SG)},
  pages={}
}