13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Novel Metrics of Speech Rhythm for the Assessment of Emotion

Fabien Ringeval (1,3), Mohamed Chetouani (2), Björn Schuller (3)

(1) DIVA group, Department of Informatics, University of Fribourg, Switzerland
(2) Institut des Systèmes Intelligents et de Robotique, Université Pierre et Marie Curie, Paris, France
(3) Institute for Human-Machine Communication, Technische Universität München, Germany

Whereas rhythmic speech analysis is known to bear great potential for the recognition of emotion, it is often omitted or reduced to the speaking rate or segmental durations. An obvious explanation is that the characterisation of speech rhythm is not an easy task itself and there exist many types of rhythmic information. In this paper, we study advanced methods to define novel metrics of speech rhythm. Their ability to characterise spontaneous emotions is demonstrated on the recent Audio/Visual Emotion Challenge Task on 3.6 hours of natural human affective conversational speech. Emotion is assessed for the four dimensions Arousal, Expectancy, Valence, and Power as binary classification tasks on the word level. We compare our new rhythmic feature types to the official 2 k brute-force acoustic baseline feature set on the Audio Sub-Challenge. In the results, the rhythmic features achieve a promising relative improvement of 16% for Valence, whereas the performance is more contrasted for the three others dimensions.

Index Terms: speech rhythm, prosodic features, emotion recognition

