The ESCA Workshop on Speech Synthesis
September 25-28, 1990
In this paper, duration control factors are statistically analyzed using Japanese speech data uttered by four speakers. According to previous studies, important factors are phoneme category, neighboring phonemes, position in a breath group and mora count of a breath group. In addition to the above factors, we introduce several new control factors. They are position in phrase, mora count of a phrase, content / function word category, and temporal compensation caused by geminated consonants. Using these statistically significant factors, a segmental duration model is proposed for Japanese speech synthesis. The duration prediction experiments using this model showed that the root mean square errors between predicted duration and observed duration were 15.30ms (19.6% of the average length) in analyzed data, and 15.84ms (19.9% of the average length) in testing data.
Bibliographic reference. Kaiki, Nobuyoshi / Takeda, Kazuya / Sagisaka, Yoshinori (1990): "The control of segmental duration in speech synthesis using linguistic properties", In SSW1-1990, 165-168.