In this paper, we present a novel approach to speaker clustering involving the use of hetero-associative neural network (HANN) to compute very low dimensional speaker discriminatory features (in our case 1-dimensional) in a data-driven manner. A HANN trained to map input feature space onto speaker labels through a bottle-neck hidden layer is expected to learn very low dimensional feature subspace essentially containing speaker information. The lower dimensional features are further used in a simple k-means clustering algorithm to obtain speaker segmentation. Evaluation of this approach on a database of real-life conversational speech from call-centers show that clustering performance achieved is similar to that of the state-of-the-art systems, although our approach uses just 1-dimensional features. Augmenting these features with the traditional mel-frequency cepstral coefficients (MFCC) features in the state-of-the-art system resulted in improved clustering performance.
Bibliographic reference. Ikbal, Shajith / Visweswariah, Karthik (2008): "Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering", In INTERSPEECH-2008, 28-31.