Child Speech Disorder Detection with Siamese Recurrent Network Using Speech Attribute Features

Jiarui Wang, Ying Qin, Zhiyuan Peng, Tan Lee


Acoustics-based automatic assessment is a highly desirable approach to detecting speech sound disorder (SSD) in children. The performance of an automatic speech assessment system depends greatly on the availability of a good amount of properly annotated disordered speech, which is a critical problem particularly for child speech. This paper presents a novel design of child speech disorder detection system that requires only normal speech for model training. The system is based on a Siamese recurrent network, which is trained to learn the similarity and discrepancy of pronunciations between a pair of phones in the embedding space. For detection of speech sound disorder, the trained network measures a distance that contrasts the test phone to the desired phone and the distance is used to train a binary classifier. Speech attribute features are incorporated to measure the pronunciation quality and provide diagnostic feedback. Experimental results show that Siamese recurrent network with a combination of speech attribute features and phone posterior features could attain an optimal detection accuracy of 0.941.


 DOI: 10.21437/Interspeech.2019-2320

Cite as: Wang, J., Qin, Y., Peng, Z., Lee, T. (2019) Child Speech Disorder Detection with Siamese Recurrent Network Using Speech Attribute Features. Proc. Interspeech 2019, 3885-3889, DOI: 10.21437/Interspeech.2019-2320.


@inproceedings{Wang2019,
  author={Jiarui Wang and Ying Qin and Zhiyuan Peng and Tan Lee},
  title={{Child Speech Disorder Detection with Siamese Recurrent Network Using Speech Attribute Features}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3885--3889},
  doi={10.21437/Interspeech.2019-2320},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2320}
}