Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs

Mahdie Karbasi, Ahmed Hussen Abdelaziz, Hendrik Meutzner, Dorothea Kolossa


Automatic prediction of speech intelligibility is highly desirable in the speech research community, since listening tests are time-consuming and can not be used online. Most of the available objective speech intelligibility measures are intrusive methods, as they require a clean reference signal in addition to the corresponding noisy/processed signal at hand. In order to overcome the problem of predicting the speech intelligibility in the absence of the clean reference signal, we have proposed in [1] to employ a recognition/synthesis framework called twin hidden Markov model (THMM) for synthesizing the clean features, required inside an intrusive intelligibility prediction method. The new framework can predict the speech intelligibility equally well as well-known intrusive methods like the short-time objective intelligibility (STOI). The original THMM, however, requires the correct transcription for synthesizing the clean reference features, which is not always available. In this paper, we go one step further and investigate the use of the recognized transcription instead of the oracle transcription for obtaining a more widely applicable speech intelligibility prediction. We show that the output of the newly-proposed blind approach is highly correlated with the human speech recognition results, collected via crowdsourcing in different noise conditions.


DOI: 10.21437/Interspeech.2016-155

Cite as

Karbasi, M., Abdelaziz, A.H., Meutzner, H., Kolossa, D. (2016) Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs. Proc. Interspeech 2016, 625-629.

Bibtex
@inproceedings{Karbasi+2016,
author={Mahdie Karbasi and Ahmed Hussen Abdelaziz and Hendrik Meutzner and Dorothea Kolossa},
title={Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-155},
url={http://dx.doi.org/10.21437/Interspeech.2016-155},
pages={625--629}
}