INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

In-Domain versus Out-of-Domain Training for Text-Dependent JFA

Patrick Kenny (1), Themos Stafylakis (1), M. J. Alam (1), Pierre Ouellet (1), Marcel Kockmann (2)

(1) CRIM, Canada
(2) VoiceTrust, Canada

We propose a simple and effective strategy to cope with dataset shifts in text-dependent speaker recognition based on Joint Factor Analysis (JFA). We have previously shown how to compensate for lexical variation in text-dependent JFA by adapting the Universal Background Model (UBM) to individual passphrases. A similar type of adaptation can be used to port a JFA model trained on out-of-domain data to a given text-dependent task domain. On the RSR2015 test set we found that this type of adaptation gave essentially the same results as in-domain JFA training. To explore this idea more fully, we experimented with several types of JFA model on the CSLU speaker recognition dataset. Taking a suitably configured JFA model trained on NIST data and adapting it in the proposed way results in a 22% reduction in error rates compared with the GMM/UBM benchmark. Error rates are still much higher than those that can be achieved on the RSR2015 test set with the same strategy but cheating experiments suggest that if large amounts of in-domain training data are available, then JFA modelling is capable in principle of achieving very low error rates even on hard tasks such as CSLU.

Full Paper

Bibliographic reference.  Kenny, Patrick / Stafylakis, Themos / Alam, M. J. / Ouellet, Pierre / Kockmann, Marcel (2014): "In-domain versus out-of-domain training for text-dependent JFA", In INTERSPEECH-2014, 1332-1336.