ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Multi-speaker meeting audio segmentation

Tin Lay Nwe, Minghui Dong, Swe Zin Kalayar Khine, Haizhou Li

This paper presents segmentation of multi-speaker meeting audio into four different classes: local speech, crosstalk, overlapped speech and non-speech sounds. Firstly, Bayesian Information Criterion (BIC) segmentation method is used to pre-segment the meeting according to speaker changing points. Then, harmonicity information is integrated into acoustic features to differentiate speech from non-speech audio segments. We use cascaded subband filters spread in pitch and harmonic frequency scales to characterize the harmonicity information. Finally, total energy and multi-pitch tracking algorithm are used to classify speech segments into local speech, overlapped speech and crosstalk audio types. Experiments conducted on subset of ICSI meeting corpus shown promising results in classifying four audio types.


doi: 10.21437/Interspeech.2008-625

Cite as: Nwe, T.L., Dong, M., Khine, S.Z.K., Li, H. (2008) Multi-speaker meeting audio segmentation. Proc. Interspeech 2008, 2522-2525, doi: 10.21437/Interspeech.2008-625

@inproceedings{nwe08_interspeech,
  author={Tin Lay Nwe and Minghui Dong and Swe Zin Kalayar Khine and Haizhou Li},
  title={{Multi-speaker meeting audio segmentation}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2522--2525},
  doi={10.21437/Interspeech.2008-625}
}