12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Woefzela - An Open-Source Platform for ASR Data Collection in the Developing World

Nic J. de Vries (1), Jaco Badenhorst (1), Marelie H. Davel (2), Etienne Barnard (2), Alta de Waal (1)

(1) CSIR, South Africa
(2) North-West University, South Africa

Building transcribed speech corpora for under-resourced languages plays a pivotal role in developing speech technologies for such languages. We have developed an open-source tool for devices running the Android operating system to facilitate the efficient collection of speech data for Automatic Speech Recognition system development. The tool was designed for use in typical developingworld conditions; we present the relevant design choices and analyse the effectiveness of this tool by means of a case study. In particular, we introduce a novel semi-real-time quality monitoring system, which increases the efficiency of the data collection process.

Full Paper

Bibliographic reference.  Vries, Nic J. de / Badenhorst, Jaco / Davel, Marelie H. / Barnard, Etienne / Waal, Alta de (2011): "Woefzela - an open-source platform for ASR data collection in the developing world", In INTERSPEECH-2011, 3177-3180.