Selection of TDOA Parameters for MDM Speaker Diarization

Beatriz Martínez-González (1), José M. Pardo (1), Julián D. Echeverry-Correa (1), José A. Vallejo-Pinto (2), Roberto Barra-Chicote (1)

(1) Speech Technology Group, ETSI Telecomunicación, Universidad Politécnica de Madrid, Spain
(2) Department of Computer Science, University of Oviedo, Spain

Several methods to improve multiple distant microphone (MDM) speaker diarization based on Time Delay of Arrival (TDOA) features are evaluated in this paper. All of them avoid the use of a single reference channel to calculate the TDOA values and, based on different criteria, select among all possible pairs of microphones a set of pairs that will be used to estimate the TDOA's. The evaluated methods have been named the "Dynamic Margin" (DM), the "Extreme Regions" (ER), the "Most Common" (MC), the "Cross Correlation" (XCorr) and the "Principle Component Analysis" (PCA). It is shown that all methods improve the baseline results for the development set and four of them improve also the results for the evaluation set. Improvements of 3.49% and 10.77% DER relative are obtained for DM and ER respectively for the test set. The XCorr and PCA methods achieve an improvement of 36.72% and 30.82% DER relative for the test set. Moreover, the computational cost for the XCorr method is 20% less than the baseline.

Index Terms: Speaker diarization, speaker localization, speaker identification, speaker segmentation

Bibliographic reference.  Martínez-González, Beatriz / Pardo, José M. / Echeverry-Correa, Julián D. / Vallejo-Pinto, José A. / Barra-Chicote, Roberto (2012): "Selection of TDOA parameters for MDM speaker diarization", In INTERSPEECH-2012, 2158-2161.