ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Intersession variability in speaker recognition: a behind the scene analysis

Daniel Garcia-Romero, Carol Y. Espy-Wilson

The representation of a speaker's identity by means of Gaussian supervectors (GSV) is at the heart of most of the state-of-the-art recognition systems. In this paper we present a novel procedure for the visualization of GSV by which qualitative insight about the information being captured can be obtained. Based on this visualization approach, the Switchboard-I database (SWB-I) is used to study the relationship between a data-driven partition of the acoustic space and a knowledge based partition (i.e., broad phonetic classes). Moreover, the structure of an intersession variability subspace (IVS), computed from the SWB-I database, is analyzed by displaying the projection of a speaker's GSV into the set of eigenvectors with highest eigenvalues. This analysis reveals a strong presence of linguistic information in the IVS components with highest energy. Finally, after projecting away the information contained in the IVS from the speaker's GSV, a visualization of the resulting GSV provides information about the characteristic patterns of spectral allocation of energy of a speaker.

doi: 10.21437/Interspeech.2008-409

Cite as: Garcia-Romero, D., Espy-Wilson, C.Y. (2008) Intersession variability in speaker recognition: a behind the scene analysis. Proc. Interspeech 2008, 1413-1416, doi: 10.21437/Interspeech.2008-409

  author={Daniel Garcia-Romero and Carol Y. Espy-Wilson},
  title={{Intersession variability in speaker recognition: a behind the scene analysis}},
  booktitle={Proc. Interspeech 2008},