10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Speaker Diarization for Meeting Room Audio

Hanwu Sun, Tin Lay Nwe, Bin Ma, Haizhou Li

Institute for Infocomm Research, Singapore

This paper describes a speaker diarization system in 2007 NIST Rich Transcription (RT07) Meeting Recognition Evaluation for the task of Multiple Distant Microphone (MDM) in meeting room scenarios. The system includes three major modules: data preparation, initial speaker clustering and cluster purification/merging. The data preparation consists of the raw data Wiener filtering and beamforming, Time Difference of Arrival estimate and speech activity detection. Based on the initial processed data, two-stage histogram quantization has been used to perform the initial speaker clustering. A modified purification strategy via high-order GMM clustering method is proposed. BIC criterion is applied for cluster merging. The system achieves a competitive overall DER of 8.31% for RT07 MDM speaker diarization task.

Full Paper

Bibliographic reference.  Sun, Hanwu / Nwe, Tin Lay / Ma, Bin / Li, Haizhou (2009): "Speaker diarization for meeting room audio", In INTERSPEECH-2009, 900-903.