10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Recognising Interest in Conversational Speech - Comparing Bag of Frames and Supra-Segmental Features

Björn Schuller, Gerhard Rigoll

Technische Universität München, Germany

It is common knowledge that affective and emotion-related states are acoustically well modelled on a supra-segmental level. Nonetheless successes are reported for frame-level processing either by means of dynamic classification or multi-instance learning techniques. In this work a quantitative feature-type-wise comparison between frame-level and supra-segmental analysis is carried out for the recognition of interest in human conversational speech. To shed light on the respective differences the same classifier, namely Support-Vector-Machines, is used in both cases: once by clustering a ‘bag of frames’ of unknown sequence length employing Multi- Instance Learning techniques, and once by statistical functional application for the projection of the time series onto a static feature vector. As database serves the Audiovisual Interest Corpus of naturalistic interest.

Full Paper

Bibliographic reference.  Schuller, Björn / Rigoll, Gerhard (2009): "Recognising interest in conversational speech - comparing bag of frames and supra-segmental features", In INTERSPEECH-2009, 1999-2002.