ISCA International Workshop on Speech and Language Technology in Education (SLaTE 2011)

Venice, Italy
August 24-26, 2011

Improving Transcription Agreement of Non-Native English Speech Corpus Transcribed by Non-Natives

Hyuksu Ryu (1), Kyuwhan Lee (2), Sunhee Kim (2), Minhwa Chung (1)

(1) Department of Linguistics; (2) Center for Humanities and Information; Seoul National University, Seoul, Korea

This paper proposes an economical and effective phonetic transcription method for dealing with a large amount of non-native English speech corpus. The method provides a consistent transcription agreement, although the corpus is transcribed by non-natives. To minimize the possibility of confusion in transcription process, forced aligned phone sequences and a set of possible mispronunciation candidate phones that Korean L2 learners are expected to make are given to the Korean transcribers for reference. The proposed method is evaluated by measuring the transcription agreement using Fleiss’ kappa as well as percentage agreement. Furthermore, the transcription consistency is analyzed by comparing it to that performed on the English corpus transcribed by native speakers. As a result, a transcription agreement of 0.869 is achieved, while the Buckeye corpus transcribed by natives shows a transcription agreement of 0.803.
Index Terms. transcription method, transcription agreement, non-native transcriber, forced alignment

Full Paper

Bibliographic reference.  Ryu, Hyuksu / Lee, Kyuwhan / Kim, Sunhee / Chung, Minhwa (2011): "Improving transcription agreement of non-native English speech corpus transcribed by non-natives", In SLaTE-2011, 61-64.