11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Decoupling Session Variability Modelling and Speaker Characterisation

Anthony Larcher, Christophe Lévy, Driss Matrouf, Jean-François Bonastre

LIA, France

The Factor Analysis framework demonstrated its high power to model session variability during the past years. However, train- ing the FA parameters implies to have a large amount of training data. When the size of the available database is limited, the number of components of the core statistical model, the UBM, is also limited as the UBM drives the dimension of the FA main matrix. As the size of the UBM gives directly the size of the speaker supervector (concatenation of the GMM mean parameters), it limits also the intrinsic capacity of the recognition system, reducing the performance expectation. This paper aims to withdraw this limitation by breaking the intrinsic link between the FA dimensionality and the UBM dimensionality. The session variability modelling is done on a smaller dimension compared to the UBM, which drives the discriminative power of the system. The first experimental results proposed in this paper, done using the NIST-SRE 2008 framework, are encouraging with a relative EER improvement of about 18% when a 512 components UBM is associated to a 32 components session variability modelling compared with a 32 components UBM associated with the same variability modelling.

Full Paper

Bibliographic reference.  Larcher, Anthony / Lévy, Christophe / Matrouf, Driss / Bonastre, Jean-François (2010): "Decoupling session variability modelling and speaker characterisation", In INTERSPEECH-2010, 2314-2317.