This paper presents a scoring system that has shown the top result on the text subset of CALL v3 shared task. The presented system is based on text embeddings, namely NNLM [1] and BERT [2]. The distinguishing feature of the given approach is that it does not rely on the reference grammar file for scoring. The model is compared against approaches that use the grammar file and proves the possibility to achieve similar and even higher results without a predefined set of correct answers. The paper describes the model itself and the data preparation process that played a crucial role in the model training.
Cite as: Sokhatskyi, V., Zvyeryeva, O., Karaulov, I., Tkanov, D. (2019) Embedding-based system for the Text part of CALL v3 shared task. Proc. 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2019), 16-19, doi: 10.21437/SLaTE.2019-4
@inproceedings{sokhatskyi19_slate, author={Volodymyr Sokhatskyi and Olga Zvyeryeva and Ievgen Karaulov and Dmytro Tkanov}, title={{Embedding-based system for the Text part of CALL v3 shared task}}, year=2019, booktitle={Proc. 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2019)}, pages={16--19}, doi={10.21437/SLaTE.2019-4} }