Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning

Ying Qin, Tan Lee, Siyuan Feng, Anthony Pak Hin Kong


This paper describes an investigation on automatic speech assessment for people with aphasia (PWA) using a DNN based automatic speech recognition (ASR) system. The main problems being addressed are the lack of training speech in the intended application domain and the relevant degradation of ASR performance for impaired speech of PWA. We adopt the TDNN-BLSTM structure for acoustic modeling and apply the technique of multi-task learning with large amount of domain-mismatched data. This leads to a significant improvement on the recognition accuracy, as compared with a conventional single-task learning DNN system. To facilitate the extraction of robust text features for quantifying language impairment in PWA speech, we propose to incorporate N-best hypotheses and confusion network representation of the ASR output. The severity of impairment is predicted from text features and supra-segmental duration features using different regression models. Experimental results show a high correlation of 0.842 between the predicted severity level and the subjective Aphasia Quotient score.


 DOI: 10.21437/Interspeech.2018-1630

Cite as: Qin, Y., Lee, T., Feng, S., Kong, A.P.H. (2018) Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning. Proc. Interspeech 2018, 3418-3422, DOI: 10.21437/Interspeech.2018-1630.


@inproceedings{Qin2018,
  author={Ying Qin and Tan Lee and Siyuan Feng and Anthony Pak Hin Kong},
  title={Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3418--3422},
  doi={10.21437/Interspeech.2018-1630},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1630}
}