The ESCA Workshop on Speech Synthesis

September 25-28, 1990
Autrans, France

The Control of Segmental Duration in Speech Synthesis Using Linguistic Properties

Nobuyoshi Kaiki (1), Kazuya Takeda (2), Yoshinori Sagisaka (2)

(1) ATR Interpreting Telephony Research Laboratories Seika-cho Soraku-gun, Kyoto, Japan
(2) KDD Kamifukuoka R & D Laboratories Kamifukuoka-city, Saitama, Japan

In this paper, duration control factors are statistically analyzed using Japanese speech data uttered by four speakers. According to previous studies, important factors are phoneme category, neighboring phonemes, position in a breath group and mora count of a breath group. In addition to the above factors, we introduce several new control factors. They are position in phrase, mora count of a phrase, content / function word category, and temporal compensation caused by geminated consonants. Using these statistically significant factors, a segmental duration model is proposed for Japanese speech synthesis. The duration prediction experiments using this model showed that the root mean square errors between predicted duration and observed duration were 15.30ms (19.6% of the average length) in analyzed data, and 15.84ms (19.9% of the average length) in testing data.

Full Paper

Bibliographic reference.  Kaiki, Nobuyoshi / Takeda, Kazuya / Sagisaka, Yoshinori (1990): "The control of segmental duration in speech synthesis using linguistic properties", In SSW1-1990, 165-168.