This study presents an approach to speaker clustering using adaptive decision tree-based phone cluster models (DT-PCMs). First, a large broadcast news database is used to train a set of phone models for universal speakers. The multi-space probability distributed-hidden Markov model (MSD-HMM) is adopted for phone modeling. Confusing phone models are merged into phone clusters. Next, for each state in the phone MSD-HMMs, a decision tree is constructed to store the contextual, phonetic, and speaker characteristics for data sharing over all speakers. For speaker clustering, each input speech segment is used to retrieve the Gaussian models from the DT-PCMs to construct the initial speaker-dependent phone cluster models. Finally, all the corresponding adapted speaker-dependent phone cluster models are used for speaker clustering via a cross-likelihood ratio measure. The experimental results show the DT-PCMs outperforms the conventional GMM-based approach.
Bibliographic reference. Hsieh, Chia-Hsin / Wu, Chung-Hsien / Shen, Han-Ping (2008): "Adaptive decision tree-based phone cluster models for speaker clustering", In INTERSPEECH-2008, 861-864.