Workshop on the Auditory Basis of Speech Perception
Keele University, UK
This paper reviews past research on human speech perception and recent studies which compare the performance of humans and speech recognizers using six modern speech corpora with vocabularies ranging from 10 to 65,000 words. Error rates of machines are often more than an order of magnitude greater than those of humans for quiet, clearly spoken speech. Machine performance degrades further below that of humans in noise and under other stressing conditions. Human performance remains high with natural variability caused by new talkers, spontaneous speaking styles, noise, and reverberation. Human performance also remains high with unnatural degradations caused by waveform clipping, band-reject filtering, and analog waveform scrambling. Humans can also recognize quiet, clearly spoken nonsense syllables and words without high-level grammatical information. Much further algorithm development is required before even the low-level acoustic-phonetic accuracy of machines equals that of humans on real-world tasks.
Bibliographic reference. Lippmann, Richard P. (1996): "Speech perception by humans and machines", In ABSP-1996, 309-316.