10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Audio-Visual Speech Asynchrony Modeling in a Talking Head

Alexey Karpov (1), Liliya Tsirulnik (2), Zdeněk Krňoul (3), Andrey Ronzhin (1), Boris Lobanov (2), Miloš Železný (3)

(1) Russian Academy of Sciences, Russia
(2) National Academy of Sciences, Belarus
(3) University of West Bohemia in Pilsen, Czech Republic

An audio-visual speech synthesis system with modeling of asynchrony between auditory and visual speech modalities is proposed in the paper. Corpus-based study of real recordings gave us the required data for understanding the problem of modalities asynchrony that is partially caused by the co-articulation phenomena. A set of context-dependent timing rules and recommendations was elaborated in order to make a synchronization of auditory and visual speech cues of the animated talking head similar to a natural humanlike way. The cognitive evaluation of the model-based talking head for Russian with implementation of the original asynchrony model has shown high intelligibility and naturalness of audio-visual synthesized speech.

Full Paper

Bibliographic reference.  Karpov, Alexey / Tsirulnik, Liliya / Krňoul, Zdeněk / Ronzhin, Andrey / Lobanov, Boris / Železný, Miloš (2009): "Audio-visual speech asynchrony modeling in a talking head", In INTERSPEECH-2009, 2911-2914.