Using Audio Events to Extend a Multi-modal Public Speaking Database with Reinterpreted Emotional Annotations

Esther Rituerto-González, Clara Luis-Mingueza, Carmen Pelález-Moreno

Emotions present in speech provide a lot of information about the emotional state of a speaker. Affective Computing is an emerging field that analyses these states and tries to improve human-computer interaction tasks.

In this paper we aim to present a preliminary study on the analysis of stress in speech and acoustic events that may possibly cause it. We merge four speech \& audio technologies: speaker and emotion recognition and acoustic event detection and classification, and explore how they influence each other.

We perform initial experiments on BioSpeech, a multi-modal emotions database we have extended with acoustic events and discuss a novel labelling process targeted to improve the classification performance.

The current study is intended as a classification and detection baseline for the mono-modal speech tasks described, and presents a discussion on the future work and multi-modal architectures to be implemented in a cyberphysical system for gender-based violence automatic detection.

doi: 10.21437/IberSPEECH.2021-13

Rituerto-González, E, Luis-Mingueza, C, Pelález-Moreno, C (2021) Using Audio Events to Extend a Multi-modal Public Speaking Database with Reinterpreted Emotional Annotations. Proc. IberSPEECH 2021, 61-65, doi: 10.21437/IberSPEECH.2021-13.