Quantifying the degree of involvement of a group of participants in a conversation is a task which humans accomplish every day, but it is something that, as of yet, machines are unable to do. In this study we first investigate the correlation between visual cues (gaze and blinking rate) and involvement. We then test the suitability of prosodic cues (acoustic model) as well as gaze and blinking (visual model) for the prediction of the degree of involvement by using a support vector machine (SVM).We also test whether the fusion of the acoustic and the visual model improves the prediction. We show that we are able to predict three classes of involvement with an reduction of error rate of 0.30 (accuracy =0.68).
Bibliographic reference. Oertel, Catharine / Scherer, Stefan / Campbell, Nick (2011): "On the use of multimodal cues for the prediction of degrees of involvement in spontaneous conversation", In INTERSPEECH-2011, 1541-1544.