8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Modelling the Human-Machine Gap in Speech Reception: Microscopic Speech Intelligibility Prediction for Normal-Hearing Subjects with an Auditory Model

Tim Jürgens, Thomas Brand, Birger Kollmeier

Carl von Ossietzky University of Oldenburg, Germany

In this study speech intelligibility in noise for normal-hearing subjects is predicted by a model that consists of an auditory preprocessing and a speech recognizer. Using a highly systematic speech corpus of phoneme combinations (logatomes) allows the analysis of response rates and confusions of single phonemes. The predicted data is validated by listening tests using the same nonsense speech material. If testing utterances that are not identical to those in training material are used, the psychometric function in noise is predicted with an offset of 13 dB to higher signal-to-noise-ratios (SNR). This is consistent with the man-machine performance gap between human speech recognition (HSR) and automatic speech recognition (ASR) [1].

However, this offset reduces to 4 dB in a second model design with identical recordings for training and testing. Furthermore predicted confusion matrices are compared to those of normal-hearing subjects with the second model design.

Full Paper

Bibliographic reference.  Jürgens, Tim / Brand, Thomas / Kollmeier, Birger (2007): "Modelling the human-machine gap in speech reception: microscopic speech intelligibility prediction for normal-hearing subjects with an auditory model", In INTERSPEECH-2007, 410-413.