INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Anchor-Model Fusion for Language Recognition

Ignacio Lopez-Moreno, Daniel Ramos, Joaquin Gonzalez-Rodriguez, Doroteo T. Toledano

Universidad Autónoma de Madrid, Spain

State-of-the-art language recognition systems usually combine multiple acoustic and phonotactic subsystems. The outputs of those systems are usually fused in different ways but the score from a trial is always obtained from N scores from N subsystems. In this paper, a robust novel approach to subsystem fusion in language recognition is proposed based on the relative performance of each trial not just to the claimed model but to all available models. The proposed technique exploits the relative behavior of a given speech utterance over the cohort of anchor models from the different subsystems, resulting in the proposed anchor-model fusion. Experiments fusing seven phone-SVM subsystems submitted by the authors to NIST LRE 2007 assess the robustness to non-uniform data availability over rule-based and trained fusion schemes as linear kernel SVM, as well as significant improvements in performance both in average EER and Cavg as used in NIST LRE.

Full Paper

Bibliographic reference.  Lopez-Moreno, Ignacio / Ramos, Daniel / Gonzalez-Rodriguez, Joaquin / Toledano, Doroteo T. (2008): "Anchor-model fusion for language recognition", In INTERSPEECH-2008, 727-730.