Assessing Child Communication Engagement via Speech Recognition in Naturalistic Active Learning Spaces

Rasa Lileikyte, Dwight Irvin, John H. L. Hansen


The ability to assess children conversational interaction is critical in determining language and cognitive proficiency for typically developing and at-risk children. The earlier at-risk child is identified, the earlier support can be provided to reduce the social impact of the speech disorder. To date, limited research has been performed for young child speech recognition in classroom settings. This study addresses speech recognition research with naturalistic children speech, where age varies from 2.5 to 5 years. Data augmentation is relatively under explored for child speech. Therefore, we investigate the effectiveness of data augmentation techniques to improve both language and acoustic models. We explore alternate text augmentation approaches using adult data, Web data, and via text generated by recurrent neural networks. We also compare several acoustic augmentation techniques: speed perturbation, tempo perturbation, and adult data. Finally, we comment on child word count rates to assess child speech development.


 DOI: 10.21437/Odyssey.2020-56

Cite as: Lileikyte, R., Irvin, D., Hansen, J.H.L. (2020) Assessing Child Communication Engagement via Speech Recognition in Naturalistic Active Learning Spaces. Proc. Odyssey 2020 The Speaker and Language Recognition Workshop, 396-401, DOI: 10.21437/Odyssey.2020-56.


@inproceedings{Lileikyte2020,
  author={Rasa Lileikyte and Dwight Irvin and John H. L. Hansen},
  title={{Assessing Child Communication Engagement via Speech Recognition in Naturalistic Active Learning Spaces}},
  year=2020,
  booktitle={Proc. Odyssey 2020 The Speaker and Language Recognition Workshop},
  pages={396--401},
  doi={10.21437/Odyssey.2020-56},
  url={http://dx.doi.org/10.21437/Odyssey.2020-56}
}