8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Simple Designing Methods of Corpus-Based Visual Speech Synthesis

Tatsuya Shiraishi (1), Tomoki Toda (2), Hiromichi Kawanami (1), Hiroshi Saruwatari (1), Kiyohiro Shikano (1)

(1) Nara Institute of Science and Technology, Japan
(2) ATR-SLT, Japan

This paper describes simple designing methods of corpus-based visual speech synthesis. Our approach needs only a synchronous real image and speech database. Visual speech is synthesized by concatenating real image segments and speech segments selected from the database. In order to automatically perform all processes, e.g. feature extraction, segment selection and segment concatenation, we simply design two types of visual speech synthesis. One is synthesizing visual speech using synchronous real image and speech segments selected with only speech information. The other is using speech segment selection and image segment selection with features extracted from the database without processes by hand. We performed objective and subjective experiments to evaluate these designing methods. As a result, the latter method can synthesize visual speech more naturally than the former method.

Full Paper

Bibliographic reference.  Shiraishi, Tatsuya / Toda, Tomoki / Kawanami, Hiromichi / Saruwatari, Hiroshi / Shikano, Kiyohiro (2003): "Simple designing methods of corpus-based visual speech synthesis", In EUROSPEECH-2003, 2241-2244.