Inter-Task System Fusion for Speaker Recognition

M. Ferras, Srikanth Madikeri, S. Dey, Petr Motlicek, Hervé Bourlard

Fusion is a common approach to improving the performance of speaker recognition systems. Multiple systems using different data, features or algorithms tend to bring complementary contributions to the final decisions being made. It is known that factors such as native language or accent contribute to speaker identity. In this paper, we explore inter-task fusion approaches to incorporating side information from accent and language identification systems to improve the performance of a speaker verification system. We explore both score level and model level approaches, linear logistic regression and linear discriminant analysis respectively, reporting significant gains on accented and multi-lingual data sets of the NIST Speaker Recognition Evaluation 2008 data. Equal error rate and expected rank metrics are reported for speaker verification and speaker identification tasks.

DOI: 10.21437/Interspeech.2016-1179

Cite as

Ferras, M., Madikeri, S., Dey, S., Motlicek, P., Bourlard, H. (2016) Inter-Task System Fusion for Speaker Recognition. Proc. Interspeech 2016, 1810-1814.

author={M. Ferras and Srikanth Madikeri and S. Dey and Petr Motlicek and Hervé Bourlard},
title={Inter-Task System Fusion for Speaker Recognition},
booktitle={Interspeech 2016},