EUROSPEECH 2003 - INTERSPEECH 2003
State of the art spoken language understanding systems are trained using labeled utterances, which is labor intensive and time consuming to prepare. In this paper, we propose methods for exploiting the unlabeled data in a statistical call classification system within a natural language dialog system. The basic assumption is that some amount of labeled data and relatively larger chunks of unlabeled data is available. The first method augments the training data by using the machine-labeled call-types for the unlabeled utterances. The second method, instead, augments the classification model trained using the human-labeled utterances with the machine-labeled ones in a weighted manner. We have evaluated these methods using a call classification system used for AT&T natural dialog customer care system. For call classification, we have used a boosting algorithm. Our results indicate that it is possible to obtain the same classification performance by using 30% less labeled data when the unlabeled data is utilized. This corresponds to a 1-1.5% absolute classification error rate reduction, using the same amount of labeled data.
Bibliographic reference. Tur, Gokhan / Hakkani-Tur, Dilek Z. (2003): "Exploiting unlabeled utterances for spoken language understanding", In EUROSPEECH-2003, 2793-2796.