In this paper we describe the speaker identification feature of the BBC World Service Archive prototype, an experiment run by BBC R&D to investigate alternative ways of publishing large radio archives. This feature relies on diarization of individual programmes, supervector-based speaker models, crowdsourcing for speaker identities, and a fast distributed index based on Locality Sensitive Hashing techniques to propagate these identities. We also describe how crowdsourced data can be used to continuously evaluate and refine our mapping from speaker models to speaker identities. We believe this experiment is one of the largest of its kind.
Bibliographic reference. Raimond, Yves / Nixon, Thomas (2014): "Identifying contributors in the BBC world service archive", In INTERSPEECH-2014, 81-85.