INTERSPEECH 2007
8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

A Soft-Clustering Algorithm for Automatic Induction of Semantic Classes

Elias Iosif, Alexandros Potamianos

Technical University of Crete, Greece

In this paper, we propose a soft-decision, unsupervised clustering algorithm that generates semantic classes automatically using the probability of class membership for each word, rather than deterministically assigning a word to a semantic class. Semantic classes are induced using an unsupervised, automatic procedure that uses a context-based similarity distance to measure semantic similarity between words. The proposed soft-decision algorithm is compared with various "hard" clustering algorithms, e.g., [1], and it is shown to improve semantic class induction performance in terms of both precision and recall for a travel reservation corpus. It is also shown that additional performance improvement is achieved by combining (auto-induced) semantic with lexical information to derive the semantic similarity distance.

Full Paper

Bibliographic reference.  Iosif, Elias / Potamianos, Alexandros (2007): "A soft-clustering algorithm for automatic induction of semantic classes", In INTERSPEECH-2007, 1609-1612.