12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Clustering with Modified Cosine Distance Learned from Constraints

Leonid Rachevsky, Dimitri Kanevsky, Ruhi Sarikaya, Bhuvana Ramabhadran

IBM T.J. Watson Research Center, USA

In this paper we present a modified cosine similarity metric that helps to make features more discriminative. The new metric is defined via various linear transformations of the original feature space to a space in which these samples are better separated. These transformations are learned from a set of constraints representing available domain knowledge by solving related optimization problems. We present results on two natural language call routing datasets that show significant improvements ranging from 3% to 5% absolute in the purity of clusters obtained in an unsupervised fashion.

Full Paper

Bibliographic reference.  Rachevsky, Leonid / Kanevsky, Dimitri / Sarikaya, Ruhi / Ramabhadran, Bhuvana (2011): "Clustering with modified cosine distance learned from constraints", In INTERSPEECH-2011, 1313-1316.