Mining Polysemous Triplets with Recurrent Neural Networks for Spoken Language Understanding

Vedran Vukotić, Christian Raymond


The typical RNN (Recurrent Neural Network) pipeline in SLU (Spoken Language Understanding), and specifically in the slot-filling task, consists of three stages: word embedding, context window representation, and label prediction. Label prediction, as a classification task, is the one that creates a sensible context window representation during learning through back-propagation. However, due to natural variations of the data, differences in two same-labeled samples can lead to dissimilar representations, whereas similarities in two differently-labeled samples can lead to them having close representations. In computer vision applications, specifically in face recognition and person re-identification, this problem has recently been successfully tackled by introducing data triplets and a triplet loss function.

In SLU, each word can be mapped to one or multiple labels depending on small variations of its context. We exploit this fact to construct data triplets consisting of the same words with different contexts that form a pair of datapoints with matching target labels and an another pair with non-matching labels. By using these triplets and an additional loss function, we update the context window representation in order to improve it, make dissimilar samples more distant and similar samples closer, leading to better classification results and an improved rate of convergence.


 DOI: 10.21437/Interspeech.2019-2977

Cite as: Vukotić, V., Raymond, C. (2019) Mining Polysemous Triplets with Recurrent Neural Networks for Spoken Language Understanding. Proc. Interspeech 2019, 1178-1182, DOI: 10.21437/Interspeech.2019-2977.


@inproceedings{Vukotić2019,
  author={Vedran Vukotić and Christian Raymond},
  title={{Mining Polysemous Triplets with Recurrent Neural Networks for Spoken Language Understanding}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={1178--1182},
  doi={10.21437/Interspeech.2019-2977},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2977}
}