In this paper, we propose the fusion of audio and explicit lip motion features for speaker identity verification applications. Experimental results using GMM-based speaker models indicate that audiovisual fusion with explicit lip motion information provides significant performance improvement for verifying both the speaker identity and the liveness, due to tracking of the closely coupled acoustic labial dynamics. Experiments performed on different gender specific subsets of data from the VidTIMIT and UCBN databases under clean and noisy conditions show that the best performance of 7%-11% EER is achieved for the speaker verification task and 4%-8% EER for the liveness verification scenario.
Bibliographic reference. Chetty, Girija / Wagner, Michael (2007): "Audiovisual speaker identity verification based on lip motion features", In INTERSPEECH-2007, 2045-2048.