ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech

Emre Yılmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik

Incorporating automatic speech recognition (ASR) in individualized speech training applications is becoming more viable thanks to the improved generalization capabilities of neural network-based acoustic models. The main problem in developing applications for dysarthric speech is the relative in-domain data scarcity. Collecting representative amounts of dysarthric speech data is difficult due to rigorous ethical and medical permission requirements, problems in accessing patients who are generally vulnerable and often subject to altering health conditions and, last but not least, the high variability in speech resulting from different pathological conditions. Developing such applications is even more challenging for languages which in general have fewer resources, fewer speakers and, consequently, also fewer patients than English, as in the case of a mid-sized language like Dutch. In this paper, we investigate a multi-stage deep neural network (DNN) training scheme aimed at obtaining better modeling of dysarthric speech by using only a small amount of in-domain training data. The results show that the system employing the proposed training scheme considerably improves the recognition of Dutch dysarthric speech compared to a baseline system with single-stage training only on a large amount of normal speech or a small amount of in-domain data.

doi: 10.21437/Interspeech.2017-303

Cite as: Yılmaz, E., Ganzeboom, M., Cucchiarini, C., Strik, H. (2017) Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech. Proc. Interspeech 2017, 2685-2689, doi: 10.21437/Interspeech.2017-303

  author={Emre Yılmaz and Mario Ganzeboom and Catia Cucchiarini and Helmer Strik},
  title={{Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech}},
  booktitle={Proc. Interspeech 2017},