This paper presents a multimodal approach to predict affective dimensions, that makes full use of features from audio, video, Electrodermal Activity (EDA) and Electrocardiogram (ECG) using three regression techniques such as support vector regression (SVR), partial least squares regression (PLS), and a deep bidirectional long short-term memory recurrent neural network (DBLSTM-RNN) regression. Each of the three regression techniques performs multimodal affective dimension prediction followed by a fusion of different models on features of four modalities using a support vector regression. A support vector regression is also applied for a final fusion of the three regression systems. Experiments show that our proposed approach obtains promising results on the AVEC 2015 benchmark dataset for prediction of multimodal affective dimensions. For the development set, the concordance correlation coefficient (CCC) achieves results of 0.856 for arousal and 0.720 for valence, which increases 3.88% and 4.66% of the top-performer of AVEC 2015 in arousal and valence, respectively.
Cite as: Huang, D.-Y., Ding, W., Xu, M., Ming, H., Dong, M., Yu, X., Li, H. (2017) Multimodal Prediction of Affective Dimensions via Fusing Multiple Regression Techniques. Proc. Interspeech 2017, 162-165, doi: 10.21437/Interspeech.2017-1088
@inproceedings{huang17_interspeech, author={D.-Y. Huang and Wan Ding and Mingyu Xu and Huaiping Ming and Minghui Dong and Xinguo Yu and Haizhou Li}, title={{Multimodal Prediction of Affective Dimensions via Fusing Multiple Regression Techniques}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={162--165}, doi={10.21437/Interspeech.2017-1088} }