EUROSPEECH 2003 - INTERSPEECH 2003
State-of-the-art spoken language understanding (SLU) systems are trained using human-labeled utterances, preparation of which is labor intensive and time consuming. Labeling is an error-prone process due to various reasons, such as labeler errors or imperfect description of classes. Thus, usually a second (or maybe more) pass(es) of labeling is required in order to check and fix the labeling errors and inconsistencies of the first (or earlier) pass(es). In this paper, we check the effect of labeling errors for statistical call classification and evaluate methods of finding and correcting these errors by checking minimum amount of data. We describe two alternative methods to speed up the labeling effort, one is based on the confidences obtained from a prior model and the other completely unsupervised. We call the labeling process employing one of these methods as active labelling. Active labeling aims to minimize the number of utterances to be checked again by automatically selecting the ones that are likely to be erroneous or inconsistent with the previously labeled examples. Although very same methods can be used as a postprocessing step to correct labeling errors, we only consider them as part of the labeling process. We have evaluated these active labelling methods using a call classification system used for AT&T natural dialog customer care system. Our results indicate that it is possible to find about 90% of the labeling errors or inconsistencies by checking just half the data.
Bibliographic reference. Tur, Gokhan / Rahim, Mazin / Hakkani-Tur, Dilek Z. (2003): "Active labeling for spoken language understanding", In EUROSPEECH-2003, 2789-2792.