Investigating Utterance Level Representations for Detecting Intent from Acoustics

SaiKrishna Rallabandi, Bhavya Karki, Carla Viegas, Eric Nyberg, Alan W Black


Recognizing paralinguistic cues from speech has applications in varied domains of speech processing. In this paper we present approaches to identify the expressed intent from acoustics in the context of INTERSPEECH 2018 ComParE challenge. We have made submissions in three sub-challenges: prediction of 1) self-assessed affect and 2) atypical affect 3) Crying Sub challenge. Since emotion and intent are perceived at suprasegmental levels, we explore a variety of utterance level embeddings. The work includes experiments with both automatically derived as well as knowledge-inspired features that capture spoken intent at various acoustic levels. Incorporation of utterance level embeddings at the text level using an off the shelf phone decoder has also been investigated. The experiments impose constraints and manipulate the training procedure using heuristics from the data distribution. We conclude by presenting the preliminary results on the development and blind test sets.


 DOI: 10.21437/Interspeech.2018-2149

Cite as: Rallabandi, S., Karki, B., Viegas, C., Nyberg, E., Black, A.W. (2018) Investigating Utterance Level Representations for Detecting Intent from Acoustics. Proc. Interspeech 2018, 516-520, DOI: 10.21437/Interspeech.2018-2149.


@inproceedings{Rallabandi2018,
  author={SaiKrishna Rallabandi and Bhavya Karki and Carla Viegas and Eric Nyberg and Alan W Black},
  title={Investigating Utterance Level Representations for Detecting Intent from Acoustics},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={516--520},
  doi={10.21437/Interspeech.2018-2149},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2149}
}