A total scheme of generating prosodic features from a text input was constructed. The method consists of corpus-based prediction of pauses, phone durations and fundamental frequencies ((F0's), in this order, and information predicted in an earlier process is utilized in the following processes. Since prediction of F0's is done on the command values of F0 contour generation process model instead of direct F0 values, a stable and flexible control of F0 contours is possible. By adding constraints on the accent command timings as a post processing, a better quality was realized when speech was synthesized using prosodic features generated by the method. Validity of the developed method was confirmed through the listening test of the synthetic speech.
Bibliographic reference. Hirose, Keikichi / Ochi, Keiko / Minematsu, Nobuaki (2007): "Corpus-based generation of prosodic features from text based on generation process model", In INTERSPEECH-2007, 1274-1277.