EUROSPEECH 2003 - INTERSPEECH 2003
We apply the Bayesian information criterion (BIC) to unsupervised segmentation of two-way telephone conversations according to speaker turns, and then proceed to produce homogenous clusters consisting of the resulting segments. Such clustering allows more accurate feature normalization and model adaption for ASR-related tasks. In contrast to similar processing of broadcast data reported in previous work, we can safely assume there are two distinguishable acoustic environments in a call, but new challenges include a much faster changing rate, variation of speaking style by a talker, and presence of crosstalk and non-meaningful sounds. The algorithm is tested on two-speaker telephone conversations with different genders and via different telephony networks (land-line and cellular). Using the purities of segments and final clusters as the performance measure, the BIC-based algorithm approaches the optimal result without requiring an iterative procedure.
Bibliographic reference. Zhong, Xin / Clements, Mark A. / Lim, Sung (2003): "Acoustic change detection and segment clustering of two-way telephone conversations", In EUROSPEECH-2003, 2925-2928.