ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Speaking faces for face-voice speaker identity verification

Girija Chetty, Michael Wagner

In this paper, we describe an approach for an animated speaking face synthesis and its application in modeling impostor/replay attack scenarios for face-voice based speaker verification systems. The speaking face reported here learns the spatiotemporal relationship between speech acoustics and MPEG4 compliant facial animation points. The influence of articulatory, perceptual, and prosodic acoustic features along with auditory context on prediction accuracy was examined. The results are indicative of vulnerability of audiovisual identity verification systems to impostor/replay attacks using synthetic faces. The level of vulnerability depends on several factors, such as the type of audiovisual features, the fusion techniques used for the audio and video features and their relative robustness. Also, the success of the synthetic impostor depends on the type of co-articulation models and acoustic features used for the audiovisual mapping of speaking face synthesis.


doi: 10.21437/Interspeech.2006-166

Cite as: Chetty, G., Wagner, M. (2006) Speaking faces for face-voice speaker identity verification. Proc. Interspeech 2006, paper 2025-Mon3A1O.6, doi: 10.21437/Interspeech.2006-166

@inproceedings{chetty06_interspeech,
  author={Girija Chetty and Michael Wagner},
  title={{Speaking faces for face-voice speaker identity verification}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 2025-Mon3A1O.6},
  doi={10.21437/Interspeech.2006-166}
}