Automatic Assessment of Language Impairment Based on Raw ASR Output

Ying Qin, Tan Lee, Anthony Pak Hin Kong


For automatic assessment of language impairment in natural speech, properly designed text-based features are needed. The feature design relies on experts’ domain knowledge and the feature extraction process may undesirably involve manual effort on transcribing. This paper describes a novel approach to automatic assessment of language impairment in narrative speech of people with aphasia (PWA), without explicit knowledge-driven feature design. A convolutional neural network (CNN) is used to extract language impairment related text features from the output of an automatic speech recognition (ASR) system or, if available, the manual transcription of input speech. To mitigate the adverse effect of ASR errors, confusion network is adopted to improve the robustness of embedding representation of ASR output. The proposed approach is evaluated on the task of discriminating severe PWA from mild PWA based on Cantonese narrative speech. Experimental results confirm the effectiveness of automatically learned text features. It is also shown that CNN models trained with text input and acoustic features are complementary to each other.


 DOI: 10.21437/Interspeech.2019-1688

Cite as: Qin, Y., Lee, T., Kong, A.P.H. (2019) Automatic Assessment of Language Impairment Based on Raw ASR Output. Proc. Interspeech 2019, 3078-3082, DOI: 10.21437/Interspeech.2019-1688.


@inproceedings{Qin2019,
  author={Ying Qin and Tan Lee and Anthony Pak Hin Kong},
  title={{Automatic Assessment of Language Impairment Based on Raw ASR Output}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3078--3082},
  doi={10.21437/Interspeech.2019-1688},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1688}
}