Dysarthric Speech Recognition Using Convolutional LSTM Neural Network

Myungjong Kim, Beiming Cao, Kwanghoon An, Jun Wang

Dysarthria is a motor speech disorder that impedes the physical production of speech. Speech in patients with dysarthria is generally characterized by poor articulation, breathy voice and monotonic intonation. Therefore, modeling the spectral and temporal characteristics of dysarthric speech is critical for better performance in dysarthric speech recognition. Convolutional long short-term memory recurrent neural networks (CLSTM-RNNs) have recently successfully been used in normal speech recognition, but have rarely been used in dysarthric speech recognition. We hypothesized CLSTM-RNNs have the potential to capture the distinct characteristics of dysarthric speech, taking advantage of convolutional neural networks (CNNs) for extracting effective local features and LSTM-RNNs for modeling temporal dependencies of the features. In this paper, we investigate the use of CLSTM-RNNs for dysarthric speech recognition. Experimental evaluation on a database collected from nine dysarthric patients showed that our approach provides substantial improvement over both standard CNN and LSTM-RNN based speech recognizers.

 DOI: 10.21437/Interspeech.2018-2250

Cite as: Kim, M., Cao, B., An, K., Wang, J. (2018) Dysarthric Speech Recognition Using Convolutional LSTM Neural Network. Proc. Interspeech 2018, 2948-2952, DOI: 10.21437/Interspeech.2018-2250.

  author={Myungjong Kim and Beiming Cao and Kwanghoon An and Jun Wang},
  title={Dysarthric Speech Recognition Using Convolutional LSTM Neural Network},
  booktitle={Proc. Interspeech 2018},