In this study, we present a model to detect user confusion in an online interview dialogue using conversational agents. Conversational agents have gained attention for reliable assessment of language learners' oral skills in interviews. Learners often face confusion, where they fail to understand what the system has said, and may end up unable to respond, leading to a conversational breakdown. It is thus crucial for the system to detect such a state and keep the interview going forward by repeating or rephrasing the previous system utterance. To this end, we first collected a dataset of user confusion using a psycholinguistic experimental approach and identified seven multimodal signs of confusion, some of which were unique to an online conversation. With the corresponding features, we trained a classification model of user confusion. An ablation study showed that the features related to self-talk and gaze direction were most predictive. We discuss how this model can assist a conversational agent to detect and resolve user confusion in real-time.
Cite as: Saeki, M., Miyagi, K., Fujie, S., Suzuki, S., Ogawa, T., Kobayashi, T., Matsuyama, Y. (2022) Confusion Detection for Adaptive Conversational Strategies of An Oral Proficiency Assessment Interview Agent. Proc. Interspeech 2022, 3988-3992, doi: 10.21437/Interspeech.2022-10075
@inproceedings{saeki22d_interspeech, author={Mao Saeki and Kotoka Miyagi and Shinya Fujie and Shungo Suzuki and Tetsuji Ogawa and Tetsunori Kobayashi and Yoichi Matsuyama}, title={{Confusion Detection for Adaptive Conversational Strategies of An Oral Proficiency Assessment Interview Agent}}, year=2022, booktitle={Proc. Interspeech 2022}, pages={3988--3992}, doi={10.21437/Interspeech.2022-10075} }