INTERSPEECH 2014

In this paper, we attempt to quantify the amount of labeled data necessary to build a stateoftheart speaker recognition system. We begin by using ivectors and the cosine similarity metric to represent an unlabeled set of utterances, then obtain labels from a noiseless oracle in the form of pairwise queries. Finally, we use the resulting speaker clusters to train a PLDA scoring function, which is assessed on the 2010 NIST Speaker Recognition Evaluation. After presenting the initial results of an algorithm that sorts queries based on nearestneighbor pairs, we develop techniques that further minimize the number of queries needed to obtain stateoftheart performance. We show the generalizability of our methods in anecdotal fashion by applying our methods to two different distributions of utterancesperspeaker and, ultimately, find that the actual number of pairwise labels needed to obtain stateoftheart results may be a mere fraction of the queries required to fully label the entire set of utterances.
Bibliographic reference. Shum, Stephen H. / Dehak, Najim / Glass, James R. (2014): "Limited labels for unlimited data: active learning for speaker recognition", In INTERSPEECH2014, 383387.