5th International Conference on Spoken Language Processing
The integration of text-to-speech (TTS) synthesis and animation of synthetic faces allows new applications like visual human computer interfaces using agents or avatars. The TTS informs the talking head when phonemes are spoken. The appropriate mouth shapes are animated and rendered while the TTS produces the sound. We call this integrated system of TTS and animation a Visual TTS (VTTS). This paper describes the architecture on an integrated VTTS synthesizer that allows defining facial expressions as bookmarks in the text that will be animated while the model is talking. The position of a bookmark in the text defines the start time for the facial expression. The bookmark itself names the expression, its amplitude and the duration during which the amplitude has to be reached by the face. A bookmark to face animation parameter (FAP) converter creates a curve defining the amplitude for the given FAP over time using Hermite functions of 3rd order [http://www.research.att.com/info/osterman].
Bibliographic reference. Ostermann, Jorn / Beutnagel, Mark C. / Fischer, Ariel / Wang, Yao (1998): "Integration of talking heads and text-to-speech synthesizers for visual TTS", In ICSLP-1998, paper 0931.