In this paper we propose a novel cluster-and-label semi-supervised algorithm for utterance classification algorithm. The approach assumes that the underlying class distribution is roughly captured through -fully unsupervised- clustering. Then, a minimum amount of labeled examples are used to automatically label the extracted clusters, so that the initial label set is "augmented" to the whole clustered data. The optimum cluster labeling is achieved by means of the Hungarian algorithm, traditionally used to solve any optimization assignment problem. Finally, the augmented labeled set is applied to train a SVM classifier. This semi-supervised approach has been compared to a fully supervised version, in which the initial labeled sets are directly used to train the SVM model.
Bibliographic reference. Albalate, Amparo / Suchindranath, Aparna / Suendermann, David / Minker, Wolfgang (2010): "A semi-supervised cluster-and-label approach for utterance classification", In INTERSPEECH-2010, 2510-2513.