In this paper, we introduce results for the task of Automatic Public Speech Assessment (APSA). Given the comparably sparse work carried out on this task up to this point, a novel database was required for training and evaluation of machine learning models. As a basis, the freely available oral presentations of the ICASSP conference in 2011 were selected due to their transcription including non-verbal vocalisations. The data was specifically labelled in terms of the perceived oratory ability of the speakers by five raters according to a 5-point Public Speaking Skill Rating Likert scale. We investigate the feasibility of speaker-independent APSA using different standardised acoustic feature sets computed per fixed chunk of an oral presentation in a series of ternary classification and continuous regression experiments. Further, we compare the relevance of different feature groups related to fluency (speech/hesitation rate), prosody, voice quality and a variety of spectral features. Our results demonstrate that oratory speaking skills can be reliably assessed using supra-segmental audio features, with prosodic ones being particularly suited.
Bibliographic reference. Azaïs, Lucas / Payan, Adrien / Sun, Tianjiao / Vidal, Guillaume / Zhang, Tina / Coutinho, Eduardo / Eyben, Florian / Schuller, Björn (2015): "Does my speech rock? automatic assessment of public speaking skills", In INTERSPEECH-2015, 2519-2523.