Speaker segmentation is widely used in many tasks such as multi-speaker detection and speaker tracking. The segmentation performance depends on the performance of speaker verification (SV) between two short utterances to a large extent, so the improvement of the SV performance for short utterances would give the segmentation performance a great help. In this paper, a method based on phoneme recognition and text-dependent speaker recognition is proposed. During segmentation, a phoneme sequence is first recognized using a phoneme recognizer and then text-dependent speaker recognition based on dynamic time warping (DTW) is performed on the same phoneme in two adjacent windows. Experiments over Chinese Corpus Consortium (CCC) MSS database showed that better performance was achieved compared with the BIC method and the GLR method.
Bibliographic reference. Wang, Gang / Wu, Xiaojun / Zheng, Thomas Fang (2010): "Using phoneme recognition and text-dependent speaker verification to improve speaker segmentation for Chinese speech", In INTERSPEECH-2010, 1457-1460.