This paper addresses the problem of automatically labeling focus word pairs in spontaneous spoken English, where a focus word pair refers to salient part of text or speech and the word motivating it. The prediction of focus word pairs is important for speech applications such as expressive text-to-speech (TTS) synthesis and speech recognition. It can also help in better textual and intention understanding for spoken dialog systems. Traditional approaches such as support vector machines (SVMs) prediction neglect the dependency between words and meet the obstacle of the imbalanced distribution of positive and negative samples of dataset. This paper introduces conditional random fields (CRFs) to the task of automatically predicting focus word pair from lexical, syntactic and semantic features. Furthermore, several new features related to syntactic and semantic information are proposed to achieve better performance. Experiments on the publicly available Switchboard corpus demonstrate that CRF model outperforms the baseline and SVM model for focus word pair prediction, and newly proposed features can further improve performance for CRF based predictor. Specifically, compared to the low recall rate of 11.31% achieved by the SVM model, the proposed CRF based predictor can yield a high recall rate of 70.88% with little impact on precision.
Bibliographic reference. Zang, Xiao / Wu, Zhiyong / Meng, Helen / Jia, Jia / Cai, Lianhong (2014): "Using conditional random fields to predict focus word pair in spontaneous spoken English", In INTERSPEECH-2014, 756-760.