SLTU-2008 - First International Workshop on Spoken Languages Technologies for Under-Resourced Languages

Hanoi, Vietnam
May 5-7, 2008

Transcribing Southern Min Speech Corpora With A Web-Based Language Learning System

Jun Cai (1,2), Jacques Feldmar (1), Yves Laprie (1), Jean-Paul Haton (1)

(1) Groupe Parole, LORIA-CNRS & INRIA, BP 239, 54600 Vandoeuvre-les-Nancy, France
(2) Dept. of Cognitive Science, Xiamen Univ., Xiamen, China

The paper proposes a human-computation-based scheme for transcribing speech corpora. The core idea of the scheme is to implement a Web-based language learning system to collect orthographic and phonetic labels from a large amount of language learners and use some criteria to choose the commonly input labels as the transcriptions of the corpora. It is essentially a technology of distributed knowledge acquisition. The benefit of the scheme is that it makes the transcribing task neither tedious nor costly. The design of a system for transcribing Min Nan speech corpora is described in detail.

Index Terms— Speech transcription, southern Min (Min Nan) language, distributed knowledge acquisition, Web-based language learning

Full Paper

Bibliographic reference.  Cai, Jun / Feldmar, Jacques / Laprie, Yves / Haton, Jean-Paul (2008): "Transcribing southern Min speech corpora with a web-based language learning system", In SLTU-2008, 102-107.