Auditory-Visual Speech Processing (AVSP'99)
August 7-10, 1999
We describe our Finnish audio-visual speech synthesizer, its evaluation and discuss possible improvements. We have combined a three dimensional facial model with a commercial audio text-to-speech synthesizer. The visual speech is based on a letter-to-viseme mapping and the animation is created by linear interpolation between the visemes. An intelligibility test was run to quantify the benefit of seeing the synthetic and natural face on hearing the synthetic and natural voice presented at different signal to noise ratios. Both natural and synthetic faces improved the intelligibility of both natural and synthetic auditory speech. We examined the confusion patterns of consonants and the identification of the Finnish visemes. We also propose how the viseme repertoire of the talking head can be improved.
Bibliographic reference. Olives, Jean-Luc / Mottonen, Riikka / Kulju, Janne / Sams, Mikko (1999): "Audio-visual speech synthesis for Finnish", In AVSP-1999, paper #27.