ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Simulating Reading Mistakes for Child Speech Transformer-Based Phone Recognition

Lucile Gelin, Thomas Pellegrini, Julien Pinquier, Morgane Daniel

Current performance of automatic speech recognition (ASR) for children is below that of the latest systems dedicated to adult speech. Child speech is particularly difficult to recognise, and substantial corpora are missing to train acoustic models. Furthermore, in the scope of our reading assistant for 5–8-year-old children learning to read, models need to cope with disfluencies and reading mistakes, which remain considerable challenges even for state-of-the-art ASR systems. In this paper, we adapt an end-to-end Transformer acoustic model to speech from children learning to read. Transfer learning (TL) with a small amount of child speech improves the phone error rate (PER) by 48.7% relative over an adult model and outperforms a TL-adapted DNN-HMM model by 21.0% relative PER. Multi-objective training with a Connectionist Temporal Classification (CTC) function further reduces the PER by 4.8% relative. We propose a method of reading mistakes data augmentation, where we simulate word-level repetitions and substitutions with phonetically or graphically close words. Combining these two types of reading mistakes reaches a 19.9% PER, with a 13.1% relative improvement over the baseline. A detailed analysis shows that both the CTC multi-objective training and the augmentation with synthetic repetitions help the attention mechanisms better detect children’s disfluencies.

doi: 10.21437/Interspeech.2021-2202

Cite as: Gelin, L., Pellegrini, T., Pinquier, J., Daniel, M. (2021) Simulating Reading Mistakes for Child Speech Transformer-Based Phone Recognition. Proc. Interspeech 2021, 3860-3864, doi: 10.21437/Interspeech.2021-2202

  author={Lucile Gelin and Thomas Pellegrini and Julien Pinquier and Morgane Daniel},
  title={{Simulating Reading Mistakes for Child Speech Transformer-Based Phone Recognition}},
  booktitle={Proc. Interspeech 2021},