This article compares the errors made by automatic speech recognizers to those made by humans for near-homophones in American English and French. This exploratory study focuses on the impact of limited word context and the potential resulting ambiguities for automatic speech recognition (ASR) systems and human listeners. Perceptual experiments using 7-gram chunks centered on incorrect or correct words output by an ASR system, show that humans make significantly more transcription errors on the first type of stimuli, thus highlighting the local ambiguity. The long-term aim of this study is to improve the modeling of such ambiguous items in order to reduce ASR errors.
Bibliographic reference. Vasilescu, Ioana / Adda-Decker, Martine / Lamel, Lori / Hallé, Pierre (2009): "A perceptual investigation of speech transcription errors involving frequent near-homophones in French and american English", In INTERSPEECH-2009, 144-147.