Investigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition Challenge

Yi Liu, Yao Tian, Liang He, Jia Liu


Collecting training data for real-world text-independent speaker recognition is challenging. In practice, utterances for a specific speaker are often mixed with many other acoustic signals. To guarantee the recognition performance, the segments spoken by target speakers should be precisely picked out. An automatic detection could be developed to reduce the cost of expensive human hand-made annotations. One way to achieve this goal is by using speaker diarization as a pre-processing step in the speaker enrollment phase. To this end, three speaker diarization algorithms based on Bayesian information criterion (BIC), agglomerative information bottleneck (aIB) and i-vector are investigated in this paper. The corresponding impacts on the results of speaker recognition system are also studied. Experiments conducted on Speaker in the Wild (SITW) Speaker Recognition Challenge (SRC) 2016 showed that the utilization of a proper speaker diarization improves the overall performance. Some more efforts are made to combine these methods together as well.


DOI: 10.21437/Interspeech.2016-1144

Cite as

Liu, Y., Tian, Y., He, L., Liu, J. (2016) Investigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition Challenge. Proc. Interspeech 2016, 853-857.

Bibtex
@inproceedings{Liu+2016,
author={Yi Liu and Yao Tian and Liang He and Jia Liu},
title={Investigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition Challenge},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1144},
url={http://dx.doi.org/10.21437/Interspeech.2016-1144},
pages={853--857}
}