Single-Ended Prediction of Listening Effort Based on Automatic Speech Recognition

Rainer Huber, Constantin Spille, Bernd T. Meyer


A new, single-ended, i.e. reference-free measure for the prediction of perceived listening effort of noisy speech is presented. It is based on phoneme posterior probabilities (or posteriorgrams) obtained from a deep neural network of an automatic speech recognition system. Additive noisy or other distortions of speech tend to smear the posteriorgrams. The smearing is quantified by a performance measure, which is used as a predictor for the perceived listening effort required to understand the noisy speech. The proposed measure was evaluated using a database obtained from the subjective evaluation of noise reduction algorithms of commercial hearing aids. Listening effort ratings of processed noisy speech samples were gathered from 20 hearing-impaired subjects. Averaged subjective ratings were compared with corresponding predictions computed by the proposed new method, the ITU-T standard P.563 for single-ended speech quality assessment, the American National Standard ANIQUE+ for single-ended speech quality assessment, and a single-ended SNR estimator. The proposed method achieved a good correlation with mean subjective ratings and clearly outperformed the standard speech quality measures and the SNR estimator.


 DOI: 10.21437/Interspeech.2017-1360

Cite as: Huber, R., Spille, C., Meyer, B.T. (2017) Single-Ended Prediction of Listening Effort Based on Automatic Speech Recognition. Proc. Interspeech 2017, 1168-1172, DOI: 10.21437/Interspeech.2017-1360.


@inproceedings{Huber2017,
  author={Rainer Huber and Constantin Spille and Bernd T. Meyer},
  title={Single-Ended Prediction of Listening Effort Based on Automatic Speech Recognition},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1168--1172},
  doi={10.21437/Interspeech.2017-1360},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1360}
}