8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Detecting User Engagement in Everyday Conversations

Chen Yu (1), Paul Aoki (2), Allison Woodruff (2)

(1) University of Rochester, USA
(2) Palo Alto Research Center, USA

This paper presents a novel application of speech emotion recognition: estimation of the level of conversational engagement between users of a voice communication system. We begin by using machine learning techniques, such as the support vector machine (SVM), to classify users' emotions as expressed in individual utterances. However, this alone fails to model the temporal and interactive aspects of conversational engagement. We therefore propose the use of a multilevel structure based on coupled hidden Markov models (CHMM) to estimate engagement levels in continuous natural speech. The first level is comprised of SVM-based classifiers that recognize emotional states, which could be discrete emotion types or arousal/valence levels. A high-level HMM then uses these emotional states as input, estimating users' engagement in conversation by decoding the internal states of the HMM. We report experimental results obtained by applying our algorithms to the LDC Emotional Prosody and CallFriend speech corpora.

Full Paper

Bibliographic reference.  Yu, Chen / Aoki, Paul / Woodruff, Allison (2004): "Detecting user engagement in everyday conversations", In INTERSPEECH-2004, 1329-1332.