Auditory-Visual Speech Processing (AVSP'99)

August 7-10, 1999
Santa Cruz, CA, USA

Audio-Visual Speech Synthesis for Finnish

Jean-Luc Olives, Riikka Mottonen, Janne Kulju, Mikko Sams

Laboratory of Computational Engineering, Helsinki University of Technology, Finland

We describe our Finnish audio-visual speech synthesizer, its evaluation and discuss possible improvements. We have combined a three dimensional facial model with a commercial audio text-to-speech synthesizer. The visual speech is based on a letter-to-viseme mapping and the animation is created by linear interpolation between the visemes. An intelligibility test was run to quantify the benefit of seeing the synthetic and natural face on hearing the synthetic and natural voice presented at different signal to noise ratios. Both natural and synthetic faces improved the intelligibility of both natural and synthetic auditory speech. We examined the confusion patterns of consonants and the identification of the Finnish visemes. We also propose how the viseme repertoire of the talking head can be improved.

Full Paper

Bibliographic reference.  Olives, Jean-Luc / Mottonen, Riikka / Kulju, Janne / Sams, Mikko (1999): "Audio-visual speech synthesis for Finnish", In AVSP-1999, paper #27.