This paper presents segmentation of multi-speaker meeting audio into four different classes: local speech, crosstalk, overlapped speech and non-speech sounds. Firstly, Bayesian Information Criterion (BIC) segmentation method is used to pre-segment the meeting according to speaker changing points. Then, harmonicity information is integrated into acoustic features to differentiate speech from non-speech audio segments. We use cascaded subband filters spread in pitch and harmonic frequency scales to characterize the harmonicity information. Finally, total energy and multi-pitch tracking algorithm are used to classify speech segments into local speech, overlapped speech and crosstalk audio types. Experiments conducted on subset of ICSI meeting corpus shown promising results in classifying four audio types.
Bibliographic reference. Nwe, Tin Lay / Dong, Minghui / Khine, Swe Zin Kalayar / Li, Haizhou (2008): "Multi-speaker meeting audio segmentation", In INTERSPEECH-2008, 2522-2525.