9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Clustering Initialization Based on Spatial Information for Speaker Diarization of Meetings

J. Luque, Carlos Segura, Javier Hernando

Universitat Politècnica de Catalunya, Spain

This paper proposes an initialization for an agglomerative system applied to speaker diarization in the meeting environment. The initialization is based on a previous clustering of the temporal sequence generated by the estimation of the Time Delay of Arrival (TDOA) among pair of sensors. That initial clustering has the purpose of obtaining initial classes with speaker information from a sole speaker. The aim is to ensure the purity of the initial segments based on the position of the speakers in a meeting along time. The TDOA initialization was tested with the dataset used in the RT07s evaluation where an improvement of the diariazation error rate is obtained with respect to the classical uniform initialization. The most of the experiments show that the purity of the beginning segments leads to a better clustering on the posterior hierarchical strategy based on cepstral features.

Full Paper

Bibliographic reference.  Luque, J. / Segura, Carlos / Hernando, Javier (2008): "Clustering initialization based on spatial information for speaker diarization of meetings", In INTERSPEECH-2008, 383-386.