It is generally conceded that duration variability has huge effects on the biometric performance of speaker recognition systems. State-of-the-art approaches, which employ i-vector representations, apply adaptive spherical (AS) score-normalizations to improve the performance of the underlying system by using specific statistics on reference and probe templates obtained from additional datasets. While variation and likely a reduction of the signal duration from reference to probe samples is unpredictable, incorporating duration information turns out to be vital in order to prevent a significant raise of entropy. In this paper we propose a duration-invariant extension of the AS-Norm, which is capable of computing more robust scores over a wide range of duration variabilities. The presented technique requires less computational effort at the time of speaker verification, and yields a 19\% relative-gain in the minimum detection costs on the current NIST i-vector challenge database, compared to the provided NIST i-vector baseline system.
Cite as: Nautsch, A., Rathgeb, C., Busch, C., Reininger, H., Kasper, K. (2014) Towards Duration Invariance of i-Vector-based Adaptive Score Normalization. Proc. The Speaker and Language Recognition Workshop (Odyssey 2014), 60-67, doi: 10.21437/Odyssey.2014-9
@inproceedings{nautsch14_odyssey, author={Andreas Nautsch and Christian Rathgeb and Christoph Busch and Herbert Reininger and Klaus Kasper}, title={{Towards Duration Invariance of i-Vector-based Adaptive Score Normalization}}, year=2014, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2014)}, pages={60--67}, doi={10.21437/Odyssey.2014-9} }