INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Where did I go wrong?: Identifying Troublesome Segments for Speaker Diarization Systems

Mary Tai Knox (1,2), Nikki Mirghafori (1), Gerald Friedland (1)

(1) International Computer Science Institute, Berkeley, California, USA
(2) Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA

The focus of this work is to identify types of segments that are difficult for state-of-the-art speaker diarization systems. The diarization outputs of five state-of-the-art systems are analyzed on short/long segments as well as segments surrounding speaker changepoints. We found that for all five systems as the duration of the segment decreased the diarization error rate (DER) increased. Also, segments immediately preceding and following speaker changepoints performed much worse than their respective counterparts. In fact, at least 40% of the DER for all five systems is attributed to time within 0.5 seconds of a speaker changepoint. We hope the results of this work motivate future improvements of speaker diarization systems.

Index Terms: speaker diarization, error analysis, rich transcription

Full Paper

Bibliographic reference.  Knox, Mary Tai / Mirghafori, Nikki / Friedland, Gerald (2012): "Where did i go wrong?: identifying troublesome segments for speaker diarization systems", In INTERSPEECH-2012, 486-489.