The goal of cross-show diarization is to index speech segments of speakers from a set of shows, with the particular challenge that reappearing speakers across shows have to be labeled with the same speaker identity. In this paper, we introduce three cross-show diarization systems namely Global-BIC-Seg, Global-BIC-Cluster, and Incremental. We compared the three systems on a set of 46 English scientific podcast shows. Among the three systems, the Global-BIC-Cluster achieves the best performance with 15.53% and 13.21% cross-show diarization error rate (DER) on the dev and test set, respectively. However, an incremental approach is more practical since data and shows are typically collected over time. By applying T-Norm on our incremental system, we obtain 13.18% and 10.97% relative improvements in terms of cross-show DER on dev and test set. We also investigate the impact of the show processing order on cross-show diarization for the incremental system.
Bibliographic reference. Yang, Qian / Jin, Qin / Schultz, Tanja (2011): "Investigation of cross-show speaker diarization", In INTERSPEECH-2011, 2925-2928.