A new, single-ended, i.e. reference-free measure for the prediction of perceived listening effort of noisy speech is presented. It is based on phoneme posterior probabilities (or posteriorgrams) obtained from a deep neural network of an automatic speech recognition system. Additive noisy or other distortions of speech tend to smear the posteriorgrams. The smearing is quantified by a performance measure, which is used as a predictor for the perceived listening effort required to understand the noisy speech. The proposed measure was evaluated using a database obtained from the subjective evaluation of noise reduction algorithms of commercial hearing aids. Listening effort ratings of processed noisy speech samples were gathered from 20 hearing-impaired subjects. Averaged subjective ratings were compared with corresponding predictions computed by the proposed new method, the ITU-T standard P.563 for single-ended speech quality assessment, the American National Standard ANIQUE+ for single-ended speech quality assessment, and a single-ended SNR estimator. The proposed method achieved a good correlation with mean subjective ratings and clearly outperformed the standard speech quality measures and the SNR estimator.
Cite as: Huber, R., Spille, C., Meyer, B.T. (2017) Single-Ended Prediction of Listening Effort Based on Automatic Speech Recognition. Proc. Interspeech 2017, 1168-1172, doi: 10.21437/Interspeech.2017-1360
@inproceedings{huber17_interspeech, author={Rainer Huber and Constantin Spille and Bernd T. Meyer}, title={{Single-Ended Prediction of Listening Effort Based on Automatic Speech Recognition}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1168--1172}, doi={10.21437/Interspeech.2017-1360} }