SLTU-2008 - First International Workshop on Spoken Languages Technologies for Under-Resourced Languages
The paper proposes a human-computation-based scheme for transcribing speech corpora. The core idea of the scheme is to implement a Web-based language learning system to collect orthographic and phonetic labels from a large amount of language learners and use some criteria to choose the commonly input labels as the transcriptions of the corpora. It is essentially a technology of distributed knowledge acquisition. The benefit of the scheme is that it makes the transcribing task neither tedious nor costly. The design of a system for transcribing Min Nan speech corpora is described in detail.
Index Terms Speech transcription, southern Min (Min Nan) language, distributed knowledge acquisition, Web-based language learning
Bibliographic reference. Cai, Jun / Feldmar, Jacques / Laprie, Yves / Haton, Jean-Paul (2008): "Transcribing southern Min speech corpora with a web-based language learning system", In SLTU-2008, 102-107.