We investigate acoustic modeling, feature extraction and feature selection for the problem of affective content recognition of generic, non-speech, non-music sounds. We annotate and analyze a database of generic sounds containing a subset of the BBC sound effects library. We use regression models, long-term features and wrapper-based feature selection to model affect in the continuous 3-D (arousal, valence, dominance) emotional space. The frame-level features for modeling are extracted from each audio clip and combined with functionals to estimate long term temporal patterns over the duration of the clip. Experimental results show that the regression models provide similar categorical performance as the more popular Gaussian Mixture Models. They are also capable of predicting accurate affective ratings on continuous scales, achieving 62.67% 3-class accuracy and 0.69.0.75 correlation with human ratings, higher than comparable numbers in literature.
Bibliographic reference. Malandrakis, Nikolaos / Sundaram, Shiva / Potamianos, Alexandros (2013): "Affective classification of generic audio clips using regression models", In INTERSPEECH-2013, 2832-2836.