12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Conversational-Side-Specific Inter-Session Variability Compensation

Mohamed Kamal Omar, Jason Pelecanos

IBM T.J. Watson Research Center, USA

General techniques for inter-session variability compensation may not capture session and channel information specific to a given conversational side. This paper investigates three methods for estimating a conversational-side-specific projection or affine transform to compensate for session and channel effects. In the first, we estimate the projection based on an estimate of the within-class covariance matrix using a conversational-side-specific subset of the development data. In the second, we use a discriminative objective function to estimate the projection parameters. We present an iterative algorithm similar to the expectation maximization (EM) algorithm to estimate the projection parameters which maximize this objective function. An affine transform of the observation vectors of each conversational side is estimated using maximum likelihood estimation in the third method. The maximum likelihood objective function is estimated on a selected subset of the development data. We present several experiments that show how these three techniques perform compared to our baseline system on the interview tasks of the NIST 2008 and the NIST 2010 speaker recognition evaluations. The best method of these techniques gives a performance improvement of up to 20% relative compared to the baseline system.

Full Paper

Bibliographic reference.  Omar, Mohamed Kamal / Pelecanos, Jason (2011): "Conversational-side-specific inter-session variability compensation", In INTERSPEECH-2011, 497-500.