End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios

Swapnil Bhosale, Imran Sheikh, Sri Harsha Dumpala, Sunil Kumar Kopparapu


End-to-end Spoken Language Understanding (SLU) systems, without speech-to-text conversion, are more promising in low resource scenarios. They can be more effective when there is not enough labeled data to train reliable speech recognition and language understanding systems, or where running SLU on edge is preferred over cloud based services. In this paper, we present an approach for bootstrapping end-to-end SLU in low resource scenarios. We show that incorporating layers extracted from pre-trained acoustic models, instead of using the typical Mel filter bank features, lead to better performing SLU models. Moreover, the layers extracted from a model pre-trained on one language perform well even for (a) SLU tasks on a different language and also (b) on utterances from speakers with speech disorder.


 DOI: 10.21437/Interspeech.2019-2366

Cite as: Bhosale, S., Sheikh, I., Dumpala, S.H., Kopparapu, S.K. (2019) End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios. Proc. Interspeech 2019, 1188-1192, DOI: 10.21437/Interspeech.2019-2366.


@inproceedings{Bhosale2019,
  author={Swapnil Bhosale and Imran Sheikh and Sri Harsha Dumpala and Sunil Kumar Kopparapu},
  title={{End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={1188--1192},
  doi={10.21437/Interspeech.2019-2366},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2366}
}