![]() |
SLTU-2008 - First International Workshop on Spoken Languages Technologies for Under-Resourced LanguagesHanoi, Vietnam |
![]() |
In this paper we outline the methods and best practices when collecting speech data for under-resourced languages. The focus of this discussion is on showing ways of improving the quality of the collection and turnaround time. This paper shows how to deal with matters concerning assistants and technical problems, as well as suggesting ways in which data management may be optimised with the use of certain techniques. This article aims at providing the reader with a total overview of improvements made during the course of a real data collection project with tangible problems and results.
Bibliographic reference. Rooyen, Marissa van / Zyl, Cecile van / Oosthuizen, Nico (2008): "The systematic collection of speech corpora for all eleven official South african languages", In SLTU-2008, 58-62.