Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition

Bhavik Vachhani, Chitralekha Bhat, Sunil Kumar Kopparapu


Dysarthria refers to a speech disorder caused by trauma to the brain areas concerned with motor aspects of speech giving rise to effortful, slow, slurred or prosodically abnormal speech. Traditional Automatic Speech Recognizers (ASR) perform poorly on dysarthric speech recognition tasks, owing mostly to insufficient dysarthric speech data. Speaker related challenges complicates data collection process for dysarthric speech. In this paper, we explore data augmentation using temporal and speed modifications of healthy speech to simulate dysarthric speech. DNN-HMM based Automatic Speech Recognition (ASR) and Random Forest based classification were used for evaluation of the proposed method. Dysarthric speech generated synthetically is classified for severity using a Random Forest classifier that is trained on actual dysarthric speech. ASR trained on healthy speech augmented with simulated dysarthric speech is evaluated for dysarthric speech recognition. All evaluations were carried out using Universal Access dysarthric speech corpus. An absolute improvement of 4.24% and 2% was achieved using tempo based and speed based data augmentation respectively as compared to ASR performance using healthy speech alone for training.


 DOI: 10.21437/Interspeech.2018-1751

Cite as: Vachhani, B., Bhat, C., Kopparapu, S.K. (2018) Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition. Proc. Interspeech 2018, 471-475, DOI: 10.21437/Interspeech.2018-1751.


@inproceedings{Vachhani2018,
  author={Bhavik Vachhani and Chitralekha Bhat and Sunil Kumar Kopparapu},
  title={Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={471--475},
  doi={10.21437/Interspeech.2018-1751},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1751}
}