Auditory-Visual Speech Processing 2007 (AVSP2007)

Kasteel Groenendaal, Hilvarenbeek, The Netherlands
August 31 - September 3, 2007

Audiovisual Speaker Identity Verification Based on Cross Modal Fusion

Girija Chetty, Michael Wagner

School of Information Sciences and Engineering, University of Canberra, Australia

In this paper, we propose the fusion of audio and explicit correlation features for speaker identity verification applications. Experiments performed with the GMM based speaker models with hybrid fusion technique involving late fusion of explicit cross-modal fusion features, with eigen lip and audio MFCC features allow a considerable improvement in EER performance An evaluation of the system performance with different gender specific datasets from controlled VidTIMIT data base and opportunistic UCBN database shows, that is possible to achieve an EER of less than 2% with correlated component hybrid fusion, and improvement of around 22 % over uncorrelated component fusion.

Full Paper

Bibliographic reference.  Chetty, Girija / Wagner, Michael (2007): "Audiovisual speaker identity verification based on cross modal fusion", In AVSP-2007, paper P37.