12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Structural Joint Factor Analysis for Speaker Recognition

Marc Ferràs, Koichi Shinoda, Sadaoki Furui

Tokyo Institute of Technology, Japan

In recent years, adaptation techniques have been given a special focus in speaker recognition tasks. Addressing the separation of speaker and session variation effects, Joint Factor Analysis (JFA) has been consolidated as a powerful adaptation framework and has become ubiquitous in the last NIST Speaker Recognition Evaluations (SRE). However, its global parameter sharing strategy is not necessarily optimal when a small amount of adaptation data is available. In this paper, we address this issue by resorting to a regularization approach such as structural MAP. We introduce two variants of structural JFA (SJFA) that, depending on the amount of data, use coarser or finer parameter approximations in the adaptation process. One of these variants is shown to considerably outperform JFA. We report relative gains over 25% EER on the 2006 NIST SRE data for GMM-SVM systems using SJFA over systems using JFA.

Full Paper

Bibliographic reference.  Ferràs, Marc / Shinoda, Koichi / Furui, Sadaoki (2011): "Structural joint factor analysis for speaker recognition", In INTERSPEECH-2011, 2373-2376.