12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Investigation of Cross-Show Speaker Diarization

Qian Yang (1), Qin Jin (2), Tanja Schultz (1)

(1) KIT, Germany
(2) Carnegie Mellon University, USA

The goal of cross-show diarization is to index speech segments of speakers from a set of shows, with the particular challenge that reappearing speakers across shows have to be labeled with the same speaker identity. In this paper, we introduce three cross-show diarization systems namely Global-BIC-Seg, Global-BIC-Cluster, and Incremental. We compared the three systems on a set of 46 English scientific podcast shows. Among the three systems, the Global-BIC-Cluster achieves the best performance with 15.53% and 13.21% cross-show diarization error rate (DER) on the dev and test set, respectively. However, an incremental approach is more practical since data and shows are typically collected over time. By applying T-Norm on our incremental system, we obtain 13.18% and 10.97% relative improvements in terms of cross-show DER on dev and test set. We also investigate the impact of the show processing order on cross-show diarization for the incremental system.

Full Paper

Bibliographic reference.  Yang, Qian / Jin, Qin / Schultz, Tanja (2011): "Investigation of cross-show speaker diarization", In INTERSPEECH-2011, 2925-2928.