Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU) because of automatic speech recognition (ASR) errors. To improve the performance of slot filling, a successful approach is to use a statistical model that is trained on ASR one-best hypotheses. The state of the art models for slot filling rely on using discriminative sequence modeling methods, such as conditional random fields (CRFs), recurrent neural networks (RNNs) and the recent recurrent CRF (R-CRF) model. In our previous work, we have also proposed the combination model of CRF and deep belief network (CRF-DBN). However, they are mostly trained with the one-best hypotheses from the ASR system. In this paper, we propose to exploit word confusion networks (WCNs) by taking the word bins in a WCN as training or testing units instead of the independent words. The units are represented by vectors composed of multiple aligned ASR hypotheses and the corresponding posterior probabilities. Before training the model, we cluster similar units that may originate from the same word. We apply our proposed method to the CRF, CRF-DBN and R-CRF models. The experiments on ATIS corpus show consistent improvements of the performance by using WCNs.
Bibliographic reference. Yang, Xiaohao / Liu, Jia (2015): "Using word confusion networks for slot filling in spoken language understanding", In INTERSPEECH-2015, 1353-1357.