International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

Face Synthesis Driven by Audio Speech Input Based on HMMs

Ling Sun, Wei Lai, Ren-Hua Wang

Department of Electronic Engineering and Information Science, University of Science & Technology of China, Heifei, China

In this paper, a HMM-based visual speech system driven by audio speech input is designed to render a face model while synchronous audio is played. Compared to many methods adopted by other researchers, there is much difference between our approach and theirs. We first train the models for every final and initial in mandarin. In this process, a large quantity of audio training data under different surroundings and spoken by different people are used. Then, the recorded synchronous audiovisual speech data are used to make the trained models more adaptive to our specific announcer. Such models are more robust in synthesis phase and satisfying performance can be achieved even when input audio speech is degraded by noises.

Full Paper

Bibliographic reference.  Sun, Ling / Lai, Wei / Wang, Ren-Hua (2002): "Face synthesis driven by audio speech input based on HMMs", In ISCSLP 2002, paper 9.