For many applications in human-computer interaction, it is desirable to predict between-(gaps) and within-(pauses) speaker silences independently of automatic speech recognition (ASR). In this study, we focus a dataset of 6 dyadic task-based interactions and aim at automatic discrimination of gaps and pauses based on F0, energy and glottal parameters derived from the speech just preceding the silence. Initial manual annotation reveals strong discriminative power of intonation tune types. In a subsequent automatic analysis using descriptive statistics of parameter contours, as well as a modelling of such contours using principal component analysis, we are able to speaker-independently predict pauses and gaps at an accuracy of 70% compared to a 56% baseline
Bibliographic reference. Kane, John / Yanushevskaya, Irena / Looze, Céline de / Vaughan, Brian / Chasaide, Ailbhe Ní (2014): "Analysing the prosodic characteristics of speech-chunks preceding silences in task-based interactions", In INTERSPEECH-2014, 333-337.