We describe a unified multi-turn multi-task spoken language understanding (SLU) solution capable of handling multiple context sensitive classification (intent determination) and sequence labeling (slot filling) tasks simultaneously. The proposed architecture is based on recurrent convolutional neural networks (RCNN) with shared feature layers and globally normalized sequence modeling components. The temporal dependencies within and across different tasks are encoded succinctly as recurrent connections. The dialog system responses beyond SLU component are also exploited as effective external features. We show with extensive experiments on a number of datasets that the proposed joint learning framework generates state-of-the-art results for both classification and tagging, and the contextual modeling based on recurrent and external features significantly improves the context sensitivity of SLU models.
Bibliographic reference. Liu, Chunxi / Xu, Puyang / Sarikaya, Ruhi (2015): "Deep contextual language understanding in spoken dialogue systems", In INTERSPEECH-2015, 120-124.