9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Intersession Variability in Speaker Recognition: A Behind the Scene Analysis

Daniel Garcia-Romero, Carol Y. Espy-Wilson

University of Maryland, USA

The representation of a speaker's identity by means of Gaussian supervectors (GSV) is at the heart of most of the state-of-the-art recognition systems. In this paper we present a novel procedure for the visualization of GSV by which qualitative insight about the information being captured can be obtained. Based on this visualization approach, the Switchboard-I database (SWB-I) is used to study the relationship between a data-driven partition of the acoustic space and a knowledge based partition (i.e., broad phonetic classes). Moreover, the structure of an intersession variability subspace (IVS), computed from the SWB-I database, is analyzed by displaying the projection of a speaker's GSV into the set of eigenvectors with highest eigenvalues. This analysis reveals a strong presence of linguistic information in the IVS components with highest energy. Finally, after projecting away the information contained in the IVS from the speaker's GSV, a visualization of the resulting GSV provides information about the characteristic patterns of spectral allocation of energy of a speaker.

Full Paper

Bibliographic reference.  Garcia-Romero, Daniel / Espy-Wilson, Carol Y. (2008): "Intersession variability in speaker recognition: a behind the scene analysis", In INTERSPEECH-2008, 1413-1416.