1st Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages

Porto Salvo, Portugal
September 3-4, 2009

Unsupervised SVM based 2-Speaker Clustering

Binda Celestino, Hugo Cordeiro, Carlos Meneses Ribeiro

Multimedia and Machine Learning Group, Department of Electronic Telecommunication and Computer Engineering, Instituto Superior de Engenharia de Lisboa (ISEL), Portugal

This paper proposes two algorithms for the task of 2-speaker unsupervised clustering. The first one creates two SVM models, one for each speaker. The second creates only one SVM model, being each speaker assigned to each class of the same model. These clustering algorithms are based on traditional two-classes SVM and use MLSF coefficients as acoustic features to represent the speakers. Tests were conducted in the audio stream of two interview videos in Portuguese, each one with two male speakers. Results must be considered as preliminary but if the speech segmentation was well conceived no errors were found. Index Terms: speaker clustering, speech segmentation, speaker segmentation, support vector machine, mel line spectrum frequencies.

Full Paper

Bibliographic reference.  Celestino, Binda / Cordeiro, Hugo / Meneses Ribeiro, Carlos (2009): "Unsupervised SVM based 2-speaker clustering", In SLTECH-2009, 81-83.