INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Evaluation of a Formant-based Speech-driven Lip Motion Generation

Carlos T. Ishi (1), Chaoran Liu (1), Hiroshi Ishiguro (2), Norihiro Hagita (1)

(1) ATR Intelligent Robotics and Communication Labs., Kyoto, Japan
(2) ATR Hiroshi Ishiguro Labs., Osaka, Japan

The background of the present work is the development of a tele-operation system where the lip motion of a remote humanoid robot is automatically controlled from the operatorfs voice. In the present paper, we introduce an improved version of our proposed speech-driven lip motion generation method, where lip height and width degrees are estimated based on vowel formant information. The method requires the calibration of only one parameter for speaker normalization, so that no training of dedicated models is necessary. Lip height control is evaluated in a female android robot Geminoid-F and in an animated face. Subjective evaluation indicated that naturalness of lip motion generated in the robot is improved by the inclusion of a partial lip width control (with stretching of the lip corners). Highest naturalness scores were achieved for the animated face, showing the effectiveness of the proposed method.

Index Terms: lip motion, formant, tele-operation, humanoid robot.

Full Paper

Bibliographic reference.  Ishi, Carlos T. / Liu, Chaoran / Ishiguro, Hiroshi / Hagita, Norihiro (2012): "Evaluation of a formant-based speech-driven lip motion generation", In INTERSPEECH-2012, 114-117.