This paper describes the systems developed by the University of Birmingham for the 2019 Spoken CALL Shared Task (ST) challenge. The task is automatic assessment of grammatical and semantic aspects of English spoken by German-speaking Swiss teenagers. Our system has two main components: automatic speech recognition (ASR) and text processing (TP). We use the ASR system that we developed for 2018 ST challenge. This is a DNN-HMM system based on sequence training with the state-level minimal Bayes risk criteria. It achieved word-error-rates (WER) of 8.89% for the ST2 test set and 10.94% for the ST3 test set. This paper focuses on development of the TP component. In particular, we explore machine learning (ML) approaches which preserve different degrees of word order. The ST responses are represented as vectors using Word2Vec and Doc2Vec models and the similarities between ASR transcriptions and reference responses are calculated using Word Mover's Distance (WMD) and Dynamic Programming (DP). A baseline rule-based TP system obtained a Df ull score of 5.639 and 5.476 for the ST2 and ST3 test set, respectively. The best ML-based TP, consisting of a Word2Vec model trained on the ST data, DP-based similarity calculation and a neural network, achieved Dfull score of 7.379 and 5.740 for ST2 and ST3 test sets, respectively.
Cite as: Qian, M., Jančovič, P., Russell, M. (2019) The University of Birmingham 2019 Spoken CALL Shared Task Systems: Exploring the importance of word order in text processing. Proc. 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2019), 11-15, doi: 10.21437/SLaTE.2019-3
@inproceedings{qian19_slate, author={Mengjie Qian and Peter Jančovič and Martin Russell}, title={{The University of Birmingham 2019 Spoken CALL Shared Task Systems: Exploring the importance of word order in text processing}}, year=2019, booktitle={Proc. 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2019)}, pages={11--15}, doi={10.21437/SLaTE.2019-3} }