A New Pre-Training Method for Training Deep Learning Models with Application to Spoken Language Understanding

Asli Celikyilmaz, Ruhi Sarikaya, Dilek Hakkani-Tür, Xiaohu Liu, Nikhil Ramesh, Gokhan Tur


We propose a simple and efficient approach for pre-training deep learning models with application to slot filling tasks in spoken language understanding. The proposed approach leverages unlabeled data to train the models and is generic enough to work with any deep learning model. In this study, we consider the CNN2CRF architecture that contains Convolutional Neural Network (CNN) with Conditional Random Fields (CRF) as top layer, since it has shown great potential for learning useful representations for supervised sequence learning tasks. The proposed pre-training approach with this architecture learns the feature representations from both labeled and unlabeled data at the CNN layer, covering features that would not be observed in limited labeled data. At the CRF layer, the unlabeled data uses predicted classes of words as latent sequence labels together with labeled sequences. Latent labeled sequences, in principle, has the regularization effect on the labeled sequences, yielding a better generalized model. This allows the network to learn representations that are useful for not only slot tagging using labeled data but also learning dependencies both within and between latent clusters of unseen words. The proposed pre-training method with the CRF2CNN architecture achieves significant gains with respect to the strongest semi-supervised baseline.


DOI: 10.21437/Interspeech.2016-512

Cite as

Celikyilmaz, A., Sarikaya, R., Hakkani-Tür, D., Liu, X., Ramesh, N., Tur, G. (2016) A New Pre-Training Method for Training Deep Learning Models with Application to Spoken Language Understanding. Proc. Interspeech 2016, 3255-3259.

Bibtex
@inproceedings{Celikyilmaz+2016,
author={Asli Celikyilmaz and Ruhi Sarikaya and Dilek Hakkani-Tür and Xiaohu Liu and Nikhil Ramesh and Gokhan Tur},
title={A New Pre-Training Method for Training Deep Learning Models with Application to Spoken Language Understanding},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-512},
url={http://dx.doi.org/10.21437/Interspeech.2016-512},
pages={3255--3259}
}