ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction

Kshitiz Kumar, Jiri Navratil, Etienne Marcheret, Vit Libal, Gerasimos Potamianos

We study the problem of detecting audio-visual synchrony in video segments containing a speaker in frontal head pose. The problem holds a number of important applications, for example speech source localization, speech activity detection, speaker diarization, speech source separation, and biometric spoofing detection. In particular, we build on earlier work, extending our previously proposed time-evolution model of audio-visual features to include non-causal (future) feature information. This significantly improves robustness of the method to small time-alignment errors between the audio and visual streams, as demonstrated by our experiments. In addition, we compare the proposed model to two known literature approaches for audio-visual synchrony detection, namely mutual information and hypothesis testing, and we show that our method is superior to both.


doi: 10.21437/Interspeech.2009-639

Cite as: Kumar, K., Navratil, J., Marcheret, E., Libal, V., Potamianos, G. (2009) Robust audio-visual speech synchrony detection by generalized bimodal linear prediction. Proc. Interspeech 2009, 2251-2254, doi: 10.21437/Interspeech.2009-639

@inproceedings{kumar09_interspeech,
  author={Kshitiz Kumar and Jiri Navratil and Etienne Marcheret and Vit Libal and Gerasimos Potamianos},
  title={{Robust audio-visual speech synchrony detection by generalized bimodal linear prediction}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2251--2254},
  doi={10.21437/Interspeech.2009-639}
}