Second ESCA/IEEE Workshop on Speech Synthesis

September 12-15, 1994
Mohonk Mountain House, New Paltz, NY, USA

Analysis and Synthesis of Fundamental Frequency Contours for the Spoken Dialogue in Japanese

Keikichi Hirose (1), Mayumi Sakata (1), Masafumi Osame (2), Hiroya Fujisaki (2)

(1) Department of Electronic Engineering, Faculty of Engineering, University of Tokyo, Japan
(2) Department of Applied Electronics, Science University of Tokyo, Yamazaki, Noda, Japan

Prosodic features of spoken dialogue were analyzed and the results were compared with those of read speech. A wider dynamic range was observed in the fundamental frequencies, accompanied by an increase in their mean value. The wider dynamic range is caused by increases in the amplitude of accent commands and in the magnitude of phrase commands. A wider dynamic range was also observed in the syllable duration. On the average, syllable duration was reduced to a great extent. Based on the results, prosodic rules were constructed for spoken dialogue by modifying those already developed for the read speech. The issue of the focal control was also addressed and incorporated in the prosodic rules. The validity of these rules was proved by the listening test of synthetic speech.

Full Paper

Bibliographic reference.  Hirose, Keikichi / Sakata, Mayumi / Osame, Masafumi / Fujisaki, Hiroya (1994): "Analysis and synthesis of fundamental frequency contours for the spoken dialogue in Japanese", In SSW2-1994, 167-170.