We examine existing and novel automatically-derived acoustic metrics that are predictive of speech intelligibility. We hypothesize that the degree of variability in feature space is correlated with the extent of a speaker’s phonemic inventory, their degree of articulatory displacements, and thus with their degree of perceived speech intelligibility. We begin by using fully-automatic F1/F2 formant frequency trajectories for both vowel space area calculation and as input to a proposed class-separability metric. We then switch to representing vowels by means of short-term spectral features, and measure vowel separability in that space. Finally, we consider the case where phonetic labeling is unavailable; here we calculate short-term spectral features for the entire speech utterance and then estimate their entropy based on the length of a minimum spanning tree. In an alternative approach, we propose to first segment the speech signal using a hidden Markov model, and then calculate spectral feature separability based on the automatically-derived classes. We apply all approaches to a database with healthy controls as well as speakers with mild dysarthria, and report the resulting coefficients of determination.
Cite as: Kain, A., Giudice, M.D., Tjaden, K. (2017) A Comparison of Sentence-Level Speech Intelligibility Metrics. Proc. Interspeech 2017, 1148-1152, doi: 10.21437/Interspeech.2017-567
@inproceedings{kain17_interspeech, author={Alexander Kain and Max Del Giudice and Kris Tjaden}, title={{A Comparison of Sentence-Level Speech Intelligibility Metrics}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1148--1152}, doi={10.21437/Interspeech.2017-567} }