INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

PLDA-Based Clustering for Speaker Diarization of Broadcast Streams

Jan Silovsky, Jan Prazak, Petr Cerva, Jindrich Zdansky, Jan Nouza

Technical University of Liberec, Czech Republic

This paper presents two approaches to speaker clustering based on Probabilistic Linear Discriminant Analysis (PLDA) in the speaker diarization task. We refer to the approaches as the multifold-PLDA approach and the onefold-PLDA approach. For both approaches, simple factor analysis model is employed to extract low-dimensional representation of a sequence of acoustic feature vectors . so called i-vectors . and these i-vectors are modeled using the PLDA model. Further, two-stage clustering with Bayesian Information Criterion (BIC) based approach applied in the first stage and the PLDA-based approach in the second stage is examined. We carried out our experiments using the COST278 multilingual broadcast news database. The best evaluated system yielded 42% relative improvement of the speaker error rate over a baseline BIC-based system.

Full Paper

Bibliographic reference.  Silovsky, Jan / Prazak, Jan / Cerva, Petr / Zdansky, Jindrich / Nouza, Jan (2011): "PLDA-based clustering for speaker diarization of broadcast streams", In INTERSPEECH-2011, 2909-2912.