Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions

Vikram Ramanarayanan, Patrick L. Lange, Keelan Evanini, Hillary R. Molloy, David Suendermann-Oeft


We present a spoken dialog-based framework for the computer-assisted language learning (CALL) of conversational English. In particular, we leveraged the open-source HALEF dialog framework to develop a job interview conversational application. We then used crowdsourcing to collect multiple interactions with the system from non-native English speakers. We analyzed human-rated scores of the recorded dialog data on three different scoring dimensions critical to the delivery of conversational English — fluency, pronunciation and intonation/stress — and further examined the efficacy of automatically-extracted, hand-curated speech features in predicting each of these sub-scores. Machine learning experiments showed that trained scoring models generally perform at par with the human inter-rater agreement baseline in predicting human-rated scores of conversational proficiency.


 DOI: 10.21437/Interspeech.2017-1213

Cite as: Ramanarayanan, V., Lange, P.L., Evanini, K., Molloy, H.R., Suendermann-Oeft, D. (2017) Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions. Proc. Interspeech 2017, 1711-1715, DOI: 10.21437/Interspeech.2017-1213.


@inproceedings{Ramanarayanan2017,
  author={Vikram Ramanarayanan and Patrick L. Lange and Keelan Evanini and Hillary R. Molloy and David Suendermann-Oeft},
  title={Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1711--1715},
  doi={10.21437/Interspeech.2017-1213},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1213}
}