While large TTS corpora exist for commercial systems created for high-resource languages such as Mandarin, English, and Spanish, for many languages such as Amharic, which are spoken by millions of people, this is not the case. We are working with “found” data collected for other purposes (e.g. training ASR systems) or available on the web (e.g. news broadcasts, audiobooks) to produce TTS systems for low-resource languages which do not currently have expensive, commercial systems. This study describes TTS systems built for Amharic from “found” data and includes systems built from different acoustic-prosodic subsets of the data, systems built from combined high and lower quality data using adaptation, and systems which use prediction of Amharic gemination to improve naturalness as perceived by evaluators.
Cite as: Tesfaye Biru, E., Tofik Mohammed, Y., Tofu, D., Cooper, E., Hirschberg, J. (2019) Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis. Proc. 10th ISCA Workshop on Speech Synthesis (SSW 10), 205-210, doi: 10.21437/SSW.2019-37
@inproceedings{tesfayebiru19_ssw, author={Elshadai {Tesfaye Biru} and Yishak {Tofik Mohammed} and David Tofu and Erica Cooper and Julia Hirschberg}, title={{Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis}}, year=2019, booktitle={Proc. 10th ISCA Workshop on Speech Synthesis (SSW 10)}, pages={205--210}, doi={10.21437/SSW.2019-37} }