Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model

Myungjong Kim, Jun Wang, Hoirin Kim


Dysarthria is a neuro-motor speech disorder that impedes the physical production of speech. Patients with dysarthria often have trouble in pronouncing certain sounds, resulting in undesirable phonetic variation. Current automatic speech recognition systems designed for the general public are ineffective for dysarthric sufferers due to the phonetic variation. In this paper, we investigate dysarthric speech recognition using Kullback-Leibler divergence-based hidden Markov models. In the model, the emission probability of state is modeled by a categorical distribution using phoneme posterior probabilities from a deep neural network, and therefore, it can effectively capture the phonetic variation of dysarthric speech. Experimental evaluation on a database of several hundred words uttered by 30 speakers consisting of 12 mildly dysarthric, 8 moderately dysarthric, and 10 control speakers showed that our approach provides substantial improvement over the conventional Gaussian mixture model and deep neural network based speech recognition systems.


DOI: 10.21437/Interspeech.2016-776

Cite as

Kim, M., Wang, J., Kim, H. (2016) Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model. Proc. Interspeech 2016, 2671-2675.

Bibtex
@inproceedings{Kim+2016,
author={Myungjong Kim and Jun Wang and Hoirin Kim},
title={Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-776},
url={http://dx.doi.org/10.21437/Interspeech.2016-776},
pages={2671--2675}
}