Using an Automated Content Scoring Engine for Spoken CALL Responses: The ETS submission for the Spoken CALL Challenge

Keelan Evanini, Matthew Mulholland, Eugene Tsuprun, Yao Qian


In this study we investigate the performance of an automated content scoring engine for accepting or rejecting learner responses in a spoken CALL application. Specifically, we employed a system based on word and character n-gram features in a support vector classification framework that was originally designed for scoring content in written texts and augmented its feature set with additional features from the following categories: prompt bias, text-to-text similarity to reference responses, and automatically detected grammatical errors. This system achieved a D score of 4.353 (compared to a baseline score of 1.694) on the test set consisting of Kaldi ASR output in the 2017 Spoken CALL Challenge. In this paper we also provide an analysis of the impact of the size and nature of the training data set (human transcriptions vs. ASR output) on the model's performance and present the results of feature ablation experiments to demonstrate which of the additional features are most helpful.


 DOI: 10.21437/SLaTE.2017-17

Cite as: Evanini, K., Mulholland, M., Tsuprun, E., Qian, Y. (2017) Using an Automated Content Scoring Engine for Spoken CALL Responses: The ETS submission for the Spoken CALL Challenge. Proc. 7th ISCA Workshop on Speech and Language Technology in Education, 97-102, DOI: 10.21437/SLaTE.2017-17.


@inproceedings{Evanini2017,
  author={Keelan Evanini and Matthew Mulholland and Eugene Tsuprun and Yao Qian},
  title={Using an Automated Content Scoring Engine for Spoken CALL Responses: The ETS submission for the Spoken CALL Challenge},
  year=2017,
  booktitle={Proc. 7th ISCA Workshop on Speech and Language Technology in Education},
  pages={97--102},
  doi={10.21437/SLaTE.2017-17},
  url={http://dx.doi.org/10.21437/SLaTE.2017-17}
}