8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Audiovisual Speaker Identity Verification Based on Lip Motion Features

Girija Chetty, Michael Wagner

University of Canberra, Australia

In this paper, we propose the fusion of audio and explicit lip motion features for speaker identity verification applications. Experimental results using GMM-based speaker models indicate that audiovisual fusion with explicit lip motion information provides significant performance improvement for verifying both the speaker identity and the liveness, due to tracking of the closely coupled acoustic labial dynamics. Experiments performed on different gender specific subsets of data from the VidTIMIT and UCBN databases under clean and noisy conditions show that the best performance of 7%-11% EER is achieved for the speaker verification task and 4%-8% EER for the liveness verification scenario.

Full Paper

Bibliographic reference.  Chetty, Girija / Wagner, Michael (2007): "Audiovisual speaker identity verification based on lip motion features", In INTERSPEECH-2007, 2045-2048.