This paper presents a two-stage multi-feature integration approach for unsupervised speaker change detection in real-time news broadcasting. We integrate MFCC and LSP features (i.e. a perceptual feature plus a articulatory feature) in the metric-based potential speaker change detection stage to collect speaker boundary candidates as many as possible. We adopt a weighted Bayesian information criterion (BIC) to integrate boundary decisions from MFCC and LSP features in the speaker boundary confirmation stage. This multi-feature integration strategy makes use of the complementarity between perceptual features and articulatory features to achieve a performance gain. Speaker change detection experiments show that the multifeature integration approach significantly outperforms the individual features with relative improvements of 26% over the LSP-only approach and 6% over the MFCC-only approach. Index Terms— speaker change detection, speaker segmentation, audio segmentation, audio content analysis
Cite as: Xie, L., Wang, G.-S. (2008) A Two-stage Multi-feature Integration Approach to Unsupervised Speaker Change Detection in Real-time News Broadcasting. Proc. International Symposium on Chinese Spoken Language Processing, 350-353
@inproceedings{xie08_iscslp, author={Lei Xie and Guang-Sen Wang}, title={{A Two-stage Multi-feature Integration Approach to Unsupervised Speaker Change Detection in Real-time News Broadcasting}}, year=2008, booktitle={Proc. International Symposium on Chinese Spoken Language Processing}, pages={350--353} }