In this paper, we propose a soft-decision, unsupervised clustering algorithm that generates semantic classes automatically using the probability of class membership for each word, rather than deterministically assigning a word to a semantic class. Semantic classes are induced using an unsupervised, automatic procedure that uses a context-based similarity distance to measure semantic similarity between words. The proposed soft-decision algorithm is compared with various "hard" clustering algorithms, e.g., , and it is shown to improve semantic class induction performance in terms of both precision and recall for a travel reservation corpus. It is also shown that additional performance improvement is achieved by combining (auto-induced) semantic with lexical information to derive the semantic similarity distance.
Bibliographic reference. Iosif, Elias / Potamianos, Alexandros (2007): "A soft-clustering algorithm for automatic induction of semantic classes", In INTERSPEECH-2007, 1609-1612.