Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Unsupervised Audio Stream Segmentation and Clustering Via the Bayesian Information Criterion

Bowen Zhou, John H. L. Hansen

Robust Speech Processing Laboratory, The Center for Spoken Language Research, University of Colorado at Boulder, Boulder, CO, USA

In this paper, we propose an e∆cient approach for unsupervised audio stream segmentation and clustering via the Bayesian Information Criterion (BIC). The proposed method extends an earlier formulation by Chen and Gopalakrishnan [1]. In our segmentation formulation, Hotelling's T2-Statistic is used to pre-select candidate segmentation boundaries followed by BIC to make the segmentation decision. Our experiments show that we can improve the final algorithm speed by an order of 100 compared to that in [1] while achieving a 7% reduced miss rate at the expense of a 6% increase in false alarm rate using DARPA Hub4 1997 evaluation data. In the clustering stage, Gaussian Mixture Models are used for gender labeling prior to hierarchical BIC-based clustering within the gender class. Our cluster experiment show that we can achieve a cluster purity of 99.3%.


  1. S. Chen, P.Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via The Bayesian Information Criterion," Proc. Broadcast News Trans. & Under. Workshop, pp. 127-132, Feb., 1998.

Full Paper

Bibliographic reference.  Zhou, Bowen / Hansen, John H. L. (2000): "Unsupervised audio stream segmentation and clustering via the Bayesian information criterion", In ICSLP-2000, vol.3, 714-717.