ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

ASR corpus design for resource-scarce languages

Etienne Barnard, Marelie Davel, Charl van Heerden

We investigate the number of speakers and the amount of data that is required for the development of useable speaker-independent speech-recognition systems in resource-scarce languages. Our experiments employ the Lwazi corpus, which contains speech in the eleven official languages of South Africa. We find that a surprisingly small number of speakers (fewer than 50) and around 10 to 20 hours of speech per language are sufficient for the purposes of acceptable phone-based recognition.


doi: 10.21437/Interspeech.2009-727

Cite as: Barnard, E., Davel, M., Heerden, C.v. (2009) ASR corpus design for resource-scarce languages. Proc. Interspeech 2009, 2847-2850, doi: 10.21437/Interspeech.2009-727

@inproceedings{barnard09_interspeech,
  author={Etienne Barnard and Marelie Davel and Charl van Heerden},
  title={{ASR corpus design for resource-scarce languages}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2847--2850},
  doi={10.21437/Interspeech.2009-727}
}