Automatic real-time captioning provides immediate and on demand access to spoken content in lectures or talks, and is a crucial accommodation for deaf and hard of hearing (DHH) people. However, in the presence of specialized content, like in technical talks, automatic speech recognition (ASR) still makes mistakes which may render the output incomprehensible. In this paper, we introduce a new approach, which allows audience or crowd workers, to quickly correct errors that they spot in ASR output. Prior approaches required the crowd worker to manually “edit” the ASR hypothesis by selecting and replacing the text, which is not suitable for real-time scenarios. Our approach is faster and allows the worker to simply type corrections for misrecognized words as soon as he or she spots them. The system then finds the most likely position for the correction in the ASR output using keyword search (KWS) and stitches the word into the ASR output. Our work demonstrates the potential of computation to incorporate human input quickly enough to be usable in real-time scenarios, and may be a better method for providing this vital accommodation to DHH people.
Bibliographic reference. Gaur, Yashesh / Metze, Florian / Miao, Yajie / Bigham, Jeffrey P. (2015): "Using keyword spotting to help humans correct captioning faster", In INTERSPEECH-2015, 2829-2833.