Identifying laughter and filled pauses is important to understanding spontaneous human speech. These are two common vocal expressions that are non-lexical and incredibly communicative. In this paper, we use a two-tiered system for identifying laughter and filled pauses. We first generate frame level hypotheses and subsequently rescore these based on features derived from acoustic syllable segmentation. Using Interspeech 2013 ComParE challenge corpus, SVC, we find that these rescoring experiments and inclusion of syllable based acoustic/prosodic features allow for the detection of laughter and filled pauses by at 89.3% UAAUC on the development set, an improvement of 1.7% over the challenge baseline.
Bibliographic reference. An, Gouzhen / Brizan, David Guy / Rosenberg, Andrew (2013): "Detecting laughter and filled pauses using syllable-based features", In INTERSPEECH-2013, 178-181.