Whereas rhythmic speech analysis is known to bear great potential for the recognition of emotion, it is often omitted or reduced to the speaking rate or segmental durations. An obvious explanation is that the characterisation of speech rhythm is not an easy task itself and there exist many types of rhythmic information. In this paper, we study advanced methods to define novel metrics of speech rhythm. Their ability to characterise spontaneous emotions is demonstrated on the recent Audio/Visual Emotion Challenge Task on 3.6 hours of natural human affective conversational speech. Emotion is assessed for the four dimensions Arousal, Expectancy, Valence, and Power as binary classification tasks on the word level. We compare our new rhythmic feature types to the official 2 k brute-force acoustic baseline feature set on the Audio Sub-Challenge. In the results, the rhythmic features achieve a promising relative improvement of 16% for Valence, whereas the performance is more contrasted for the three others dimensions.
Index Terms: speech rhythm, prosodic features, emotion recognition
Bibliographic reference. Ringeval, Fabien / Chetouani, Mohamed / Schuller, Björn (2012): "Novel metrics of speech rhythm for the assessment of emotion", In INTERSPEECH-2012, 346-349.