A comparison between automatic speech recognition (ASR) and human speech recognition (HSR) is performed as prerequisite for identifying sources of errors and improving feature extraction in ASR. HSR and ASR experiments are carried out with the same logatome database which consists of nonsense syllables. Two different kinds of signals are presented to human listeners: First, noisy speech samples are converted to Mel-frequency cepstral coefficients which are resynthesized to speech, with information about voicing and fundamental frequency being discarded. Second, the original signals with added noise are presented, which is used to evaluate the loss of information caused by the process of resynthesis. The analysis also covers the degradation of ASR caused by dialect or accent and shows that different error patterns emerge for ASR and HSR. The information loss induced by the calculation of ASR features has the same effect as a deterioration of the SNR by 10 dB.
Bibliographic reference. Meyer, Bernd T. / Wächter, Matthias / Brand, Thomas / Kollmeier, Birger (2007): "Phoneme confusions in human and automatic speech recognition", In INTERSPEECH-2007, 1485-1488.