Auditory-Visual Speech Processing (AVSP'98)

December 4-6, 1998
Terrigal - Sydney, Australia

Visual Speech Synthesis With Concatenative Speech

Asa Hallgren, Bertil Lyberg

Telia Research AB, (Sweden)

Today synthetic speech is often based on concatenation of natural speech, i.e. units such as diphones or polyphones are taken from natural speech and are then put together to form any word or sentence. So far there have mainly been two ways of adding a visual modality to such a synthesis: Morphing between single images or concatenating video sequences. In this study, however, a new method is presented where recorded natural movements of points on the face are used to control an animated face.


Full Paper

Bibliographic reference.  Hallgren, Asa / Lyberg, Bertil (1998): "Visual speech synthesis with concatenative speech", In AVSP-1998, 181-184.

Multimedia Files

Link Original Filename Description Format
av98_181_1.mov (12941 KB) 0045_01.mov Example of concatenative visual speech synthesis Video File: QuickTime; 320x240, 25 Hz, 24 bits per pixel, compressed (RLE)
av98_181_2.mov (16519 KB) 0045_02.mov Example of concatenative visual speech synthesis Video File: QuickTime; 320x240, 25 Hz, 24 bits per pixel, compressed (RLE)