In the scope of our oral reading exercise for 5-8-year-old children, models need to be able to precisely detect and diagnose reading mistakes, which remains a considerable challenge even for state-of-the-art ASR systems. In this paper, we compare hybrid and end-to-end acoustic models trained for phoneme recognition on young learners' speech. We evaluate them not only with phoneme error rates but through detailed phoneme-level misread detection and diagnostic metrics. We show that a traditional TDNNF-HMM model, despite a high PER, is the best at detecting reading mistakes (F1-score 72.6%), but at the cost of low precision (73.8%) and specificity (74.7%), which is pedagogically critical. A recent Transformer+CTC model, to which we applied our synthetic reading mistakes augmentation method, obtains the highest precision (81.8%) and specificity (86.3%), as well as the highest correct diagnosis rate (70.7%), showing it is the best fit for our application.
Cite as: Gelin, L., Daniel, M., Pellegrini, T., Pinquier, J. (2023) Comparing phoneme recognition systems on the detection and diagnosis of reading mistakes for young children's oral reading evaluation. Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE), 6-10, doi: 10.21437/SLaTE.2023-2
@inproceedings{gelin23_slate, author={Lucile Gelin and Morgane Daniel and Thomas Pellegrini and Julien Pinquier}, title={{Comparing phoneme recognition systems on the detection and diagnosis of reading mistakes for young children's oral reading evaluation}}, year=2023, booktitle={Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE)}, pages={6--10}, doi={10.21437/SLaTE.2023-2} }