Speech Recognition and Intrinsic Variation (SRIV2006)

Toulouse, France
May 20, 2006

A Human-Machine Comparison in Speech Recognition Based on a Logatome Corpus

Bernd Meyer (1), Thorsten Wesker (1), Thomas Brand (1), Alfred Mertins (2), Birger Kollmeier (1)

Department of Physics, (1) Medical Physics, (2) Signal Processing Group, Carl von Ossietzky University of Oldenburg, Germany

In this study, a fair comparison of human and machine speech recognition is established by using the same paradigms for human speech recognition (HSR) and automatic speech recognition (ASR). In order to ensure equal conditions, a speech database specifically designed for this task is used. The results for HSR and ASR are broken down into several intrinsic variabilities like speaking rate, speaking effort and dialect. Across all conditions, ASR error rates are at least 300 % higher than those of humans, even though no contextual knowledge can be exploited. A more detailed analysis of errors in HSR and ASR is carried out by decomposing speech into its phonetic features like voicing or manner and place of articulation. Confusion matrices for these features show that voicing information is crucial to distinguish between certain consonants. The most prominent features for ASR often neglect voicing information, which might contribute to the large gap in performance between HSR and ASR.

Full Paper

Bibliographic reference.  Meyer, Bernd / Wesker, Thorsten / Brand, Thomas / Mertins, Alfred / Kollmeier, Birger (2006): "A human-machine comparison in speech recognition based on a logatome corpus", In SRIV-2006, 95-100.