In this paper, we introduce results for the task of Automatic Public Speech Assessment (APSA). Given the comparably sparse work carried out on this task up to this point, a novel database was required for training and evaluation of machine learning models. As a basis, the freely available oral presentations of the ICASSP conference in 2011 were selected due to their transcription including non-verbal vocalisations. The data was specifically labelled in terms of the perceived oratory ability of the speakers by five raters according to a 5-point Public Speaking Skill Rating Likert scale. We investigate the feasibility of speaker-independent APSA using different standardised acoustic feature sets computed per fixed chunk of an oral presentation in a series of ternary classification and continuous regression experiments. Further, we compare the relevance of different feature groups related to fluency (speech/hesitation rate), prosody, voice quality and a variety of spectral features. Our results demonstrate that oratory speaking skills can be reliably assessed using supra-segmental audio features, with prosodic ones being particularly suited.
Cite as: Azaïs, L., Payan, A., Sun, T., Vidal, G., Zhang, T., Coutinho, E., Eyben, F., Schuller, B. (2015) Does my speech rock? automatic assessment of public speaking skills. Proc. Interspeech 2015, 2519-2523, doi: 10.21437/Interspeech.2015-543
@inproceedings{azais15_interspeech, author={Lucas Azaïs and Adrien Payan and Tianjiao Sun and Guillaume Vidal and Tina Zhang and Eduardo Coutinho and Florian Eyben and Björn Schuller}, title={{Does my speech rock? automatic assessment of public speaking skills}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={2519--2523}, doi={10.21437/Interspeech.2015-543} }