Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models

Ricardo Kleinlein, Cristina Luna Jiménez, Juan Manuel Montero, Zoraida Callejas, Fernando Fernández-Martínez


Electrodermal activity (EDA) is a psychophysiological indicator that can be considered a somatic marker of the emotional and attentional reaction of subjects towards stimuli like audiovisual content. EDA measurements are not biased by the cognitive process of giving an opinion or a score to characterize the subjective perception, and group-level EDA recordings integrate the reaction of an audience, thus reducing the signal noise. This paper contributes to the field of audience’s attention prediction to video content, extending previous novel work on the use of EDA as ground truth for prediction algorithms. Videos are segmented into shorter clips attending to the audience’s increasing or decreasing attention, and we process videos’ audio waveform to extract meaningful aural embeddings from a VGGish model pretrained on the Audioset database. While previous similar work on attention level prediction using only audio accomplished 69.83% accuracy, we propose a Mixture of Experts approach to train a binary classifier that outperforms the main existing state-of-the-art approaches predicting increasing and decreasing attention levels with 81.76% accuracy. These results confirm the usefulness of providing acoustic features with a semantic significance, and the convenience of considering experts over partitions of the dataset in order to predict group-level attention from audio.


 DOI: 10.21437/Interspeech.2019-2799

Cite as: Kleinlein, R., Jiménez, C.L., Montero, J.M., Callejas, Z., Fernández-Martínez, F. (2019) Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models. Proc. Interspeech 2019, 61-65, DOI: 10.21437/Interspeech.2019-2799.


@inproceedings{Kleinlein2019,
  author={Ricardo Kleinlein and Cristina Luna Jiménez and Juan Manuel Montero and Zoraida Callejas and Fernando Fernández-Martínez},
  title={{Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={61--65},
  doi={10.21437/Interspeech.2019-2799},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2799}
}