EUROSPEECH 2003 - INTERSPEECH 2003
This paper describes simple designing methods of corpus-based visual speech synthesis. Our approach needs only a synchronous real image and speech database. Visual speech is synthesized by concatenating real image segments and speech segments selected from the database. In order to automatically perform all processes, e.g. feature extraction, segment selection and segment concatenation, we simply design two types of visual speech synthesis. One is synthesizing visual speech using synchronous real image and speech segments selected with only speech information. The other is using speech segment selection and image segment selection with features extracted from the database without processes by hand. We performed objective and subjective experiments to evaluate these designing methods. As a result, the latter method can synthesize visual speech more naturally than the former method.
Bibliographic reference. Shiraishi, Tatsuya / Toda, Tomoki / Kawanami, Hiromichi / Saruwatari, Hiroshi / Shikano, Kiyohiro (2003): "Simple designing methods of corpus-based visual speech synthesis", In EUROSPEECH-2003, 2241-2244.