Odyssey 2010: The Speaker and Language Recognition Workshop
Brno, Czech Republic
Speaker verification systems have shown significant progress and have reached a level of performance that make their use in practical applications possible. Nevertheless, large differences in terms of performance are observed, depending on the speaker or the speech excerpt used. This context emphasizes the importance of a deeper analysis of the system's performance over average error rate. In this paper, the effect of the training excerpt is investigated using ALIZE/SpkDet on two different corpora: NIST-SRE 08 (conversational speech) and BREF 120 (controlled read speech). The results show that the SVS performance are highly dependent on the voice samples used to train the speaker model: the overall Equal Error Rate (EER) ranges from 4.1% to 29.1% on NIST-SRE 08 and from 1.0% to 33.0% on BREF 120. The hypothesis that such performance differences are explained by phonetic contents of voice samples is studied on BREF 120.
Full Paper (PDF)
Bibliographic reference. Kahn, Juliette / Audibert, Nicolas / Rossato, Solange / Bonastre, Jean-François (2010): "Intra-speaker variability effects on Speaker Verification performance", In Odyssey-2010, paper 021.