ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Speaker diarization for meeting room audio

Hanwu Sun, Tin Lay Nwe, Bin Ma, Haizhou Li

This paper describes a speaker diarization system in 2007 NIST Rich Transcription (RT07) Meeting Recognition Evaluation for the task of Multiple Distant Microphone (MDM) in meeting room scenarios. The system includes three major modules: data preparation, initial speaker clustering and cluster purification/merging. The data preparation consists of the raw data Wiener filtering and beamforming, Time Difference of Arrival estimate and speech activity detection. Based on the initial processed data, two-stage histogram quantization has been used to perform the initial speaker clustering. A modified purification strategy via high-order GMM clustering method is proposed. BIC criterion is applied for cluster merging. The system achieves a competitive overall DER of 8.31% for RT07 MDM speaker diarization task.

doi: 10.21437/Interspeech.2009-271

Cite as: Sun, H., Nwe, T.L., Ma, B., Li, H. (2009) Speaker diarization for meeting room audio. Proc. Interspeech 2009, 900-903, doi: 10.21437/Interspeech.2009-271

  author={Hanwu Sun and Tin Lay Nwe and Bin Ma and Haizhou Li},
  title={{Speaker diarization for meeting room audio}},
  booktitle={Proc. Interspeech 2009},