11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

A Semi-Supervised Cluster-and-Label Approach for Utterance Classification

Amparo Albalate (1), Aparna Suchindranath (1), David Suendermann (2), Wolfgang Minker (1)

(1) Universität Ulm, Germany
(2) SpeechCycle Labs, USA

In this paper we propose a novel cluster-and-label semi-supervised algorithm for utterance classification algorithm. The approach assumes that the underlying class distribution is roughly captured through -fully unsupervised- clustering. Then, a minimum amount of labeled examples are used to automatically label the extracted clusters, so that the initial label set is "augmented" to the whole clustered data. The optimum cluster labeling is achieved by means of the Hungarian algorithm, traditionally used to solve any optimization assignment problem. Finally, the augmented labeled set is applied to train a SVM classifier. This semi-supervised approach has been compared to a fully supervised version, in which the initial labeled sets are directly used to train the SVM model.

Full Paper

Bibliographic reference.  Albalate, Amparo / Suchindranath, Aparna / Suendermann, David / Minker, Wolfgang (2010): "A semi-supervised cluster-and-label approach for utterance classification", In INTERSPEECH-2010, 2510-2513.