This paper proposes an economical and effective phonetic transcription method for dealing with a large amount of non-native English speech corpus. The method provides a consistent transcription agreement, although the corpus is transcribed by non-natives. To minimize the possibility of confusion in transcription process, forced aligned phone sequences and a set of possible mispronunciation candidate phones that Korean L2 learners are expected to make are given to the Korean transcribers for reference. The proposed method is evaluated by measuring the transcription agreement using Fleiss’ kappa as well as percentage agreement. Furthermore, the transcription consistency is analyzed by comparing it to that performed on the English corpus transcribed by native speakers. As a result, a transcription agreement of 0.869 is achieved, while the Buckeye corpus transcribed by natives shows a transcription agreement of 0.803.
Index Terms. transcription method, transcription agreement, non-native transcriber, forced alignment
Cite as: Ryu, H., Lee, K., Kim, S., Chung, M. (2011) Improving transcription agreement of non-native English speech corpus transcribed by non-natives. Proc. Speech and Language Technology in Education (SLaTE 2011), 61-64
@inproceedings{ryu11_slate, author={Hyuksu Ryu and Kyuwhan Lee and Sunhee Kim and Minhwa Chung}, title={{Improving transcription agreement of non-native English speech corpus transcribed by non-natives}}, year=2011, booktitle={Proc. Speech and Language Technology in Education (SLaTE 2011)}, pages={61--64} }