Factor Analysis Based Speaker Normalisation for Continuous Emotion Prediction

Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah

Speaker variability has been shown to be a significant confounding factor in speech based emotion classification systems and a number of speaker normalisation techniques have been proposed. However, speaker normalisation in systems that predict continuous multidimensional descriptions of emotion such as arousal and valence has not been explored. This paper investigates the effect of speaker variability in such speech based continuous emotion prediction systems and proposes a factor analysis based speaker normalisation technique. The proposed technique operates directly on the feature space and decomposes it into speaker and emotion specific sub-spaces. The proposed technique is validated on both the USC CreativeIT database and the SEMAINE database and leads to improvements of 8.2% and 11.0% (in terms of correlation coefficient) on the two databases respectively when predicting arousal.

DOI: 10.21437/Interspeech.2016-880

Cite as

Dang, T., Sethu, V., Ambikairajah, E. (2016) Factor Analysis Based Speaker Normalisation for Continuous Emotion Prediction. Proc. Interspeech 2016, 913-917.

author={Ting Dang and Vidhyasaharan Sethu and Eliathamby Ambikairajah},
title={Factor Analysis Based Speaker Normalisation for Continuous Emotion Prediction},
booktitle={Interspeech 2016},