It is widely acknowledged that human listeners significantly outperform machines when it comes to transcribing speech. This paper presents a paradigm for perceptual experiments that aims to increase our understanding of human and automatic speech recognition errors. The role of the context length is investigated through perceptual recovery of small homophonic words or near-homophones yielding frequent automatic transcription errors. The same experimental protocol of varied size speech stimuli transcription is applied to both French and English. Our hypothesis is that ambiguity due to homophonic words reduces with context size for both languages, which in turn should entail reduced perception and transcription errors. The results show that context plays a central role as the human word error rate decreases significantly with increasing context. The long-term aim is to improve the modelling of such ambiguous items to reduce automatic errors.
Bibliographic reference. Vasilescu, I. / Yahia, D. / Snoeren, N. / Adda-Decker, Martine / Lamel, Lori (2011): "Cross-lingual study of ASR errors: on the role of the context in human perception of near-homophones", In INTERSPEECH-2011, 1949-1952.