Fusion Strategy for Prosodic and Lexical Representations of Word Importance

Sushant Kafle, Cecilia Ovesdotter Alm, Matt Huenerfauth


We investigate whether, and if so when, prosodic features in spoken dialogue aid in modeling the importance of words to the overall meaning of a dialogue turn. Starting from the assumption that acoustic-prosodic cues help identify important speech content, we investigate representation architectures that combine lexical and prosodic features and evaluate them for predicting word importance. We propose an attention-based feature fusion strategy and additionally show how the addition of strategic supervision of the attention weights results in especially competitive models. We evaluate our fusion strategy on spoken dialogues and demonstrate performance increases over state-of-the-art models. Specifically, our approach both achieves the lowest root mean square error on test data and generalizes better over out-of-vocabulary words.


 DOI: 10.21437/Interspeech.2019-1898

Cite as: Kafle, S., Alm, C.O., Huenerfauth, M. (2019) Fusion Strategy for Prosodic and Lexical Representations of Word Importance. Proc. Interspeech 2019, 1313-1317, DOI: 10.21437/Interspeech.2019-1898.


@inproceedings{Kafle2019,
  author={Sushant Kafle and Cecilia Ovesdotter Alm and Matt Huenerfauth},
  title={{Fusion Strategy for Prosodic and Lexical Representations of Word Importance}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={1313--1317},
  doi={10.21437/Interspeech.2019-1898},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1898}
}