Auditory-Visual Speech Processing (AVSP'99)

August 7-10, 1999
Santa Cruz, CA, USA

A Text-Speech Synchronization Technique with Applications to Talking Heads

Fabio Vignoli, Carlo Braccini

DIST - University of Genova, Italy

In human communication, speech understanding is greatly improved by the bimodal acoustic-visual effect with respect to simple speech communication, in particular when the communication takes place in noisy environments. In this paper we propose a novel synchronization procedure between text and speech, to reduce the time consumption in the development of friendly audio--visual interfaces or authoring tools for multimedia production. The technique consists of a neural network based processing of speech and a time alignment algorithm. The proposed algorithm is fast and speaker independent since it uses neural networks trained to discriminate among broad phoneme classes and not to recognize speech. This technique has been used to animate the MPEG-4 compliant face model developed at DIST [3].

