In this paper we present a modified cosine similarity metric that helps to make features more discriminative. The new metric is defined via various linear transformations of the original feature space to a space in which these samples are better separated. These transformations are learned from a set of constraints representing available domain knowledge by solving related optimization problems. We present results on two natural language call routing datasets that show significant improvements ranging from 3% to 5% absolute in the purity of clusters obtained in an unsupervised fashion.
Bibliographic reference. Rachevsky, Leonid / Kanevsky, Dimitri / Sarikaya, Ruhi / Ramabhadran, Bhuvana (2011): "Clustering with modified cosine distance learned from constraints", In INTERSPEECH-2011, 1313-1316.