Capsule Networks for Low Resource Spoken Language Understanding

Vincent Renkens, Hugo van Hamme


Designing a spoken language understanding system for command-and-control applications can be challenging because of a wide variety of domains and users or because of a lack of training data. In this paper we discuss a system that learns from scratch from user demonstrations. This method has the advantage that the same system can be used for many domains and users without modifications and that no training data is required prior to deployment. The user is required to train the system, so for a user friendly experience it is crucial to minimize the required amount of data. In this paper we investigate whether a capsule network can make efficient use of the limited amount of available training data. We compare the proposed model to an approach based on Non-negative Matrix Factorisation which is the state-of-the-art in this setting and another deep learning approach that was recently introduced for end-to-end spoken language understanding. We show that the proposed model outperforms the baseline models for three command-and-control applications: controlling a small robot, a vocally guided card game and a home automation task.


 DOI: 10.21437/Interspeech.2018-1013

Cite as: Renkens, V., van Hamme, H. (2018) Capsule Networks for Low Resource Spoken Language Understanding. Proc. Interspeech 2018, 601-605, DOI: 10.21437/Interspeech.2018-1013.


@inproceedings{Renkens2018,
  author={Vincent Renkens and Hugo {van Hamme}},
  title={Capsule Networks for Low Resource Spoken Language Understanding},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={601--605},
  doi={10.21437/Interspeech.2018-1013},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1013}
}