11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

An Integrated Top-Down/Bottom-Up Approach to Speaker Diarization

Simon Bozonnet (1), Nicholas Evans (1), Corinne Fredouille (2), Dong Wang (1), Raphaël Troncy (1)

(1) EURECOM, France
(2) LIA, France

Most speaker diarization systems fit into one of two categories: bottom-up or top-down. Bottom-up systems are the most popular but can sometimes suffer from instability from merging and stopping criteria difficulties. Top-down systems deliver competitive results but are particularly prone to poor model initialization which often leads to large variations in performance. This paper presents a new integrated bottom-up/top-down approach to speaker diarization which aims to harness the strengths of each system and thus to improve performance and stability. In contrast to previous work, here the two systems are fused at the heart of the segmentation and clustering stage. Experimental results show improvements in speaker diarization performance for both meeting and TV-show domain data indicating increased intra and inter-domain stability. On the TV-show data in particular, an average relative improvement of 26% DER is obtained.

Full Paper

Bibliographic reference.  Bozonnet, Simon / Evans, Nicholas / Fredouille, Corinne / Wang, Dong / Troncy, Raphaël (2010): "An integrated top-down/bottom-up approach to speaker diarization", In INTERSPEECH-2010, 2646-2649.