VB-HMM Speaker Diarization with Enhanced and Refined Segment Representation

Xianhong Chen, Liang He, Can Xu, Yi Liu, Tianyu Liang, Jia Liu


Variational Bayes hidden Markov model (VB-HMM) is a soft speaker diarization system. It is often combined with fixed length segmentation (FLS) instead of speaker change detection (SCD) to avoid SCD error propagation. However, as each segment is too short to provide enough speaker information, the emission probability (given a speaker, a segment occurs) will be noisy and inaccuracy. Therefore, we propose a VB-HMM speaker diarization system with enhanced and refined segment representation. First, it enhances the segment representation with stream neighbors to extract more information of the same speaker to improve the accuracy of emission probability. And then it further refines the segment representation with speaker change points in the iteration to dislodge the information of other speakers from the neighbors. The experiment results on RT09 demonstrate that, VB-HMM with enhanced and refined segment representation has a relatively improvement of 22.9$\%$ compared with VB-HMM with only FLS.


 DOI: 10.21437/Odyssey.2018-19

Cite as: Chen, X., He, L., Xu, C., Liu, Y., Liang, T., Liu, J. (2018) VB-HMM Speaker Diarization with Enhanced and Refined Segment Representation . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 134-139, DOI: 10.21437/Odyssey.2018-19.


@inproceedings{Chen2018,
  author={Xianhong Chen and Liang He and Can Xu and Yi Liu and Tianyu Liang and Jia Liu},
  title={VB-HMM Speaker Diarization with Enhanced and Refined Segment Representation	},
  year=2018,
  booktitle={Proc. Odyssey 2018 The Speaker and Language Recognition Workshop},
  pages={134--139},
  doi={10.21437/Odyssey.2018-19},
  url={http://dx.doi.org/10.21437/Odyssey.2018-19}
}