State of the art language recognition systems usually add a backend prior to the linear fusion of the subsystems scores. The backend plays a dual role. When the set of languages for which models have been trained does not match the set of target languages, the backend maps the available scores to the space of target languages. On the other hand, the backend serves as a precalibration stage that adapts the amorphous space of scores. In this work, well known backends (Generative Gaussian Backend, Discriminative Gaussian Backend and Logistic Regression Backend) and newer proposals (Fully Bayesian Gaussian Backend and Gaussian Mixture Backend) are analyzed and compared. The effect of applying a T-Norm or a ZT-Norm is also analyzed. Finally the effect of discarding development signals, those with the highest scores, is also studied. Experiments have been carried out on the NIST 2009 LRE database, using a state-of-the-art Language Recognition System consisting of the fusion of five subsystems: A Linearized Eigenchannel GMM (LE-GMM) subsystem, an iVector subsystem and three phone-lattice-SVM subsystems.
Index Terms: Spoken Language Recognition, Gaussian Backend, Gaussian Mixture Backend, Discriminative Gaussian Backend
Bibliographic reference. Penagarikano, Mikel / Varona, Amparo / Diez, Mireia / Rodriguez-Fuentes, Luis Javier / Bordel, German (2012): "Study of different backends in a state-of-the-art language recognition system", In INTERSPEECH-2012, 2049-2052.