9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Learning Essential Speaker Sub-Space Using Hetero-Associative Neural Networks for Speaker Clustering

Shajith Ikbal, Karthik Visweswariah

IBM India Research Lab, India

In this paper, we present a novel approach to speaker clustering involving the use of hetero-associative neural network (HANN) to compute very low dimensional speaker discriminatory features (in our case 1-dimensional) in a data-driven manner. A HANN trained to map input feature space onto speaker labels through a bottle-neck hidden layer is expected to learn very low dimensional feature subspace essentially containing speaker information. The lower dimensional features are further used in a simple k-means clustering algorithm to obtain speaker segmentation. Evaluation of this approach on a database of real-life conversational speech from call-centers show that clustering performance achieved is similar to that of the state-of-the-art systems, although our approach uses just 1-dimensional features. Augmenting these features with the traditional mel-frequency cepstral coefficients (MFCC) features in the state-of-the-art system resulted in improved clustering performance.

Full Paper

Bibliographic reference.  Ikbal, Shajith / Visweswariah, Karthik (2008): "Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering", In INTERSPEECH-2008, 28-31.