12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

On the Use of Multimodal Cues for the Prediction of Degrees of Involvement in Spontaneous Conversation

Catharine Oertel (1), Stefan Scherer (2), Nick Campbell (1)

(1) Trinity College Dublin, Ireland
(2) Universität Ulm, Germany

Quantifying the degree of involvement of a group of participants in a conversation is a task which humans accomplish every day, but it is something that, as of yet, machines are unable to do. In this study we first investigate the correlation between visual cues (gaze and blinking rate) and involvement. We then test the suitability of prosodic cues (acoustic model) as well as gaze and blinking (visual model) for the prediction of the degree of involvement by using a support vector machine (SVM).We also test whether the fusion of the acoustic and the visual model improves the prediction. We show that we are able to predict three classes of involvement with an reduction of error rate of 0.30 (accuracy =0.68).

Full Paper

Bibliographic reference.  Oertel, Catharine / Scherer, Stefan / Campbell, Nick (2011): "On the use of multimodal cues for the prediction of degrees of involvement in spontaneous conversation", In INTERSPEECH-2011, 1541-1544.