Odyssey 2012 - The Speaker and Language Recognition Workshop
Dataset shift is a problem widely studied in the field of speaker recognition. Among the different types of dataset shift, covariate shift is the most common one in real scenarios. Traditional solutions for the problem of covariate shift have been developed in the context of channel and session variability, and make use of large datasets to train models for channel/session compensation. However, in real applications, it is not always possible to obtain a large matched dataset to train these techniques.
This work analyzes the stages of an i-vector system that are more vulnerable to covariate shift, and proposes different techniques to mitigate this effect. The proposed techniques operate under the assumption that little matched data is available for development. These techniques are evaluated in a scenario where covariate shift is simulated introducing language shift. Among the proposed techniques, the most promising one is the i-vector adaptation based on the mean centering and length normalization technique.
However, the proposed techniques are not enough to reduce the wide gap in the accuracy that appears in presence of covariate shift.
Bibliographic reference. Vaquero, Carlos (2012): "Dataset shift in PLDA based speaker verification", In Odyssey-2012, 39-46.