Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Corpus-Based Synthesis of Fundamental Frequency Contours with Various Speaking Styles from Text Using F0 Contour Generation Process Model

Keikichi Hirose (1), Kentaro Sato (1), Nobuaki Minematsu (2)

(1) Dept. of Frontier Informatics, School of Frontier Sciences, University of Tokyo, Japan
(2) Dept. of Inf. and Commun. Engg., School of Inf. Science and Tech., University of Tokyo, Japan

A corpus-based method of generating fundamental frequency (F0) contours of various speaking styles from text was developed. Instead of directly predicting F0 values, the method predicts command values of the F0 contour generation process model. Because of the model constraint, the resulting F0 contour keeps certain naturalness even when the prediction is done incorrectly. The method includes a scheme of automatic extraction of the model commands, which is necessary to prepare the training corpuses for various speaking styles. By introducing constraints on phrase command locations, a better extraction was realized, led to a better performance of the method. Speech synthesis was conducted using HMM speech synthesizer for calm speech and three types of emotional speech. The perceptual experiment showed the designated emotions could be well conveyed with the F0 contours generated by the developed method.

