On Employing a Highly Mismatched Crowd for Speech Transcription

Purushotam Radadia, Rahul Kumar, Kanika Kalra, Shirish Karande, Sachin Lodha

Crowd sourcing provides a cheap and fast way to obtain speech transcriptions. The crowd size available for a task is inversely proportional to the skill requirements. Hence, there has been recent interest in studying the utility of mismatched crowd workers, who provide transcriptions even without knowing the source language. Nevertheless, these studies have required that the worker be capable of providing a transcription in Roman script. We believe that if the script constraint is removed, then countries like India can provide significantly larger crowd base. With this as a motivation, in this paper, we consider transcription of spoken Russian words by a rural Indian crowd that is unfamiliar with Russian and has very limited knowledge of English. The crowd we employ knew Gujarati, Marathi, Telugu and used the scripts of these languages to provide their transcriptions. We utilized an insertion-deletion-substitution channel to model the transcription errors. With a parallel channel model we can easily combine the crowd inputs. We show that the 4 transcriptions in Indic scripts (2 Gujarati, 1 Marathi, 1 Telugu) provide an accuracy of 73.77 (vs. 47% for ROVER algorithm) and a 4-best accuracy of 86.48%, even without employing any worker filtering.

DOI: 10.21437/Interspeech.2016-673

Cite as

Radadia, P., Kumar, R., Kalra, K., Karande, S., Lodha, S. (2016) On Employing a Highly Mismatched Crowd for Speech Transcription. Proc. Interspeech 2016, 3017-3021.

author={Purushotam Radadia and Rahul Kumar and Kanika Kalra and Shirish Karande and Sachin Lodha},
title={On Employing a Highly Mismatched Crowd for Speech Transcription},
booktitle={Interspeech 2016},