We propose a trajectory-guided, real sample concatenating approach for synthesizing high-quality photo-real articulator animation. It renders a photo-real video of articulators in sync with given speech signals by searching for the closest real image sample sequence in the library to the HMM predicted trajectory. Objectively, we evaluated the performance of our system in terms of MSE and investigate the pruning strategies in terms of storage and processing speed. Our talking head took part in the LIPS2009 Challenge contest and won the FIRST place with a subjective MOS score of 4.15 in the Audio-Visual match evaluated by 20 human subjects.
Full Paper Multimedia Files
Bibliographic reference. Wang, Lijuan / Qian, Xiaojun / Han, Wei / Soong, Frank K. (2010): "Synthesizing photo-real talking head via trajectory-guided sample selection", In INTERSPEECH-2010, 446-449.