9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Multi-Speaker Meeting Audio Segmentation

Tin Lay Nwe, Minghui Dong, Swe Zin Kalayar Khine, Haizhou Li

Institute for Infocomm Research, Singapore

This paper presents segmentation of multi-speaker meeting audio into four different classes: local speech, crosstalk, overlapped speech and non-speech sounds. Firstly, Bayesian Information Criterion (BIC) segmentation method is used to pre-segment the meeting according to speaker changing points. Then, harmonicity information is integrated into acoustic features to differentiate speech from non-speech audio segments. We use cascaded subband filters spread in pitch and harmonic frequency scales to characterize the harmonicity information. Finally, total energy and multi-pitch tracking algorithm are used to classify speech segments into local speech, overlapped speech and crosstalk audio types. Experiments conducted on subset of ICSI meeting corpus shown promising results in classifying four audio types.

Full Paper

Bibliographic reference.  Nwe, Tin Lay / Dong, Minghui / Khine, Swe Zin Kalayar / Li, Haizhou (2008): "Multi-speaker meeting audio segmentation", In INTERSPEECH-2008, 2522-2525.