Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder

Chitralekha Bhat, Biswajit Das, Bhavik Vachhani, Sunil Kumar Kopparapu


Dysarthria is a manisfestation of the disruption in the neuro-muscular physiology resulting in uneven, slow, slurred, harsh or quiet speech. Dysarthric speech poses serious challenges to automatic speech recognition, considering this speech is difficult to decipher for both humans and machines. The objective of this work is to enhance dysarthric speech features to match that of healthy control speech. We use a Time-Delay Neural Network based Denoising Autoencoder (TDNN-DAE) to enhance the dysarthric speech features. The dysarthric speech thus enhanced is recognized using a DNN-HMM based Automatic Speech Recognition (ASR) engine. This methodology was evaluated for speaker-independent (SI) and speaker-adapted (SA) systems. Absolute improvements of 13% and 3% was observed in the ASR performance for SI and SA systems respectively as compared with unenhanced dysarthric speech recognition.


 DOI: 10.21437/Interspeech.2018-1754

Cite as: Bhat, C., Das, B., Vachhani, B., Kopparapu, S.K. (2018) Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder. Proc. Interspeech 2018, 451-455, DOI: 10.21437/Interspeech.2018-1754.


@inproceedings{Bhat2018,
  author={Chitralekha Bhat and Biswajit Das and Bhavik Vachhani and Sunil Kumar Kopparapu},
  title={Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={451--455},
  doi={10.21437/Interspeech.2018-1754},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1754}
}