Natural language understanding (NLU) systems for speech applications require large quantities of annotated data. We investigate the use of a domain-independent machine-translation-based paraphrase system to improve performance without incurring the costs of obtaining additional annotated data in an NLU system. Our experimental system incorporates Support Vector Machine (SVM) domain and intent models to detect intents, and a conditional random field (CRF) model to identify semantic slots in a given query. Two approaches are compared. In the first, we retrain models using generated paraphrases to augment the original training set. In the second, we use paraphrases as supplementary features of the original queries in the SVM and CRF models. Experiments in four domains indicate that incorporating paraphrase yields useful performance gains, and that the feature-based approach provides more stable performance than synthetic augmentation of training data.
Bibliographic reference. Liu, Xiaohu / Sarikaya, Ruhi / Brockett, Chris / Quirk, Chris / Dolan, William B. (2013): "Paraphrase features to improve natural language understanding", In INTERSPEECH-2013, 3776-3779.