![]() |
Speech Recognition and Intrinsic Variation (SRIV2006)Toulouse, France |
![]() |
In this study, a fair comparison of human and machine speech recognition is established by using the same paradigms for human speech recognition (HSR) and automatic speech recognition (ASR). In order to ensure equal conditions, a speech database specifically designed for this task is used. The results for HSR and ASR are broken down into several intrinsic variabilities like speaking rate, speaking effort and dialect. Across all conditions, ASR error rates are at least 300 % higher than those of humans, even though no contextual knowledge can be exploited. A more detailed analysis of errors in HSR and ASR is carried out by decomposing speech into its phonetic features like voicing or manner and place of articulation. Confusion matrices for these features show that voicing information is crucial to distinguish between certain consonants. The most prominent features for ASR often neglect voicing information, which might contribute to the large gap in performance between HSR and ASR.
Bibliographic reference. Meyer, Bernd / Wesker, Thorsten / Brand, Thomas / Mertins, Alfred / Kollmeier, Birger (2006): "A human-machine comparison in speech recognition based on a logatome corpus", In SRIV-2006, 95-100.