8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Benchmarking Human Performance on the Acoustic and Linguistic Subtasks of ASR Systems

László Tóth

Hungarian Academy of Sciences, Hungary

Many believe that comparisons of machine and human speech recognition could help determine both the room for and the direction of improvement for speech recognizers. Yet, such experiments are made quite rarely or over such complex domains where instructive conclusions are hard to draw. In this paper we attempt to measure human performance on the tasks of the acoustic and language models of ASR systems separately. To simulate the task of acoustic decoding, subjects were instructed to phonetically transcribe short nonsense sentences. Here, besides the well-known superior segment classification, we also observed a good performance in word segmentation. To imitate higher-level processing, the subjects had to correct deliberately corrupted texts. Here we found that humans can achieve a word accuracy of about 80% even when almost one third of the phonemes are incorrect, and that with word boundary position information the word error rate roughly halves.

Full Paper

Bibliographic reference.  Tóth, László (2007): "Benchmarking human performance on the acoustic and linguistic subtasks of ASR systems", In INTERSPEECH-2007, 382-385.