Second ESCA/IEEE Workshop on Speech Synthesis
September 12-15, 1994
The authors have analyzed the fundamental frequency (F0) contours of Japanese sentences spoken in four styles, e.g. unmarked, hurried, angry and gentle, for the synthesis of natural sounding speech. Thirty-five sentences in each speaking style spoken by a professional narrator were analyzed. The parameters of the Fo generation model proposed by Fujisaki, i.e. the minimum value of F0 (Fmin), the amplitude of the phrase commands (Ap) and the amplitude of the accent commands (Aa), are used here as key factors in the analysis. In the case of the sentences spoken angrily, Fmin is kept high, and the change due to both the phrase component and the accent component is minimal. Consequently, the FQ contours of sentences spoken angrily are flat. On the other hand, in the sentences spoken softly, the dynamic range due to the accent component is greater than for the others, and in order to keep it high the amplitude of the phrase component is accordingly suppressed. The Fo contours of the sentences spoken hurriedly are similar except that the amplitude of the accent commands is slightly smaller than for those spoken normally. It was found that these parameters are useful to express the difference due to the speaking styles.
Bibliographic reference. Higuchi, Norio / Hirai, Toshio / Sagisaka, Yoshinori (1994): "Effect of speaking style on parameters of fundamental frequency contour", In SSW2-1994, 135-138.