ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Paralinguistic event detection from speech using probabilistic time-series smoothing and masking

Rahul Gupta, Kartik Audhkhasi, Sungbok Lee, Shrikanth Narayanan

Non-verbal speech cues serve multiple functions in human interaction such as maintaining the conversational flow as well as expressing emotions, personality, and interpersonal attitude. In particular, non-verbal vocalizations such as laughters are associated with affective expressions while vocal fillers are used to hold the floor during a conversation. The Interspeech 2013 Social Signals Sub-Challenge involves detection of these two types of non-verbal signals in telephonic speech dialogs. We extend the challenge baseline system by using filtering and masking techniques on probabilistic time series representing the occurrence of a vocal event. We obtain improved area under receiver operating characteristic (ROC) curve of 93.3% (10.4% absolute improvement) for laughters and 89.7% (6.1% absolute improvement) for fillers on the test set. This improvement suggests the importance of using temporal context for detecting these paralinguistic events.


doi: 10.21437/Interspeech.2013-61

Cite as: Gupta, R., Audhkhasi, K., Lee, S., Narayanan, S. (2013) Paralinguistic event detection from speech using probabilistic time-series smoothing and masking. Proc. Interspeech 2013, 173-177, doi: 10.21437/Interspeech.2013-61

@inproceedings{gupta13_interspeech,
  author={Rahul Gupta and Kartik Audhkhasi and Sungbok Lee and Shrikanth Narayanan},
  title={{Paralinguistic event detection from speech using probabilistic time-series smoothing and masking}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={173--177},
  doi={10.21437/Interspeech.2013-61}
}