The focus of this work is to identify types of segments that are difficult for state-of-the-art speaker diarization systems. The diarization outputs of five state-of-the-art systems are analyzed on short/long segments as well as segments surrounding speaker changepoints. We found that for all five systems as the duration of the segment decreased the diarization error rate (DER) increased. Also, segments immediately preceding and following speaker changepoints performed much worse than their respective counterparts. In fact, at least 40% of the DER for all five systems is attributed to time within 0.5 seconds of a speaker changepoint. We hope the results of this work motivate future improvements of speaker diarization systems.
Index Terms: speaker diarization, error analysis, rich transcription
Bibliographic reference. Knox, Mary Tai / Mirghafori, Nikki / Friedland, Gerald (2012): "Where did i go wrong?: identifying troublesome segments for speaker diarization systems", In INTERSPEECH-2012, 486-489.