Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis

Gábor Gosztolya, Tamás Grósz, György Szaszák, László Tóth


In the Sincerity Sub-Challenge of the Interspeech ComParE 2016 Challenge, the task is to estimate user-annotated sincerity scores for speech samples. We interpret this challenge as a rank-learning regression task, since the evaluation metric (Spearman’s correlation) is calculated from the rank of the instances. As a first approach, Deep Neural Networks are used by introducing a novel error criterion which maximizes the correlation metric directly. We obtained the best performance by combining the proposed error function with the conventional MSE error. This approach yielded results that outperform the baseline on the Challenge test set. Furthermore, we introduce a compact prosodic feature set based on a dynamic representation of F0, energy and sound duration. We extract syllable-based prosodic features which are used as the basis of another machine learning step. We show that a small set of prosodic features is capable of yielding a result very close to the baseline one and that by combining the predictions yielded by DNN and the prosodic feature set, further improvement can be reached, significantly outperforming the baseline SVR on the Challenge test set.


DOI: 10.21437/Interspeech.2016-956

Cite as

Gosztolya, G., Grósz, T., Szaszák, G., Tóth, L. (2016) Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis. Proc. Interspeech 2016, 2026-2030.

Bibtex
@inproceedings{Gosztolya+2016,
author={Gábor Gosztolya and Tamás Grósz and György Szaszák and László Tóth},
title={Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-956},
url={http://dx.doi.org/10.21437/Interspeech.2016-956},
pages={2026--2030}
}