11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Synthesizing Photo-Real Talking Head via Trajectory-Guided Sample Selection

Lijuan Wang (1), Xiaojun Qian (2), Wei Han (1), Frank K. Soong (1)

(1) Microsoft Research, China
(2) Chinese University of Hong Kong, China

We propose a trajectory-guided, real sample concatenating approach for synthesizing high-quality photo-real articulator animation. It renders a photo-real video of articulators in sync with given speech signals by searching for the closest real image sample sequence in the library to the HMM predicted trajectory. Objectively, we evaluated the performance of our system in terms of MSE and investigate the pruning strategies in terms of storage and processing speed. Our talking head took part in the LIPS2009 Challenge contest and won the FIRST place with a subjective MOS score of 4.15 in the Audio-Visual match evaluated by 20 human subjects.

Full Paper     Multimedia Files

Bibliographic reference.  Wang, Lijuan / Qian, Xiaojun / Han, Wei / Soong, Frank K. (2010): "Synthesizing photo-real talking head via trajectory-guided sample selection", In INTERSPEECH-2010, 446-449.