EUROSPEECH 2001 Scandinavia
This paper proposes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of F0 units are basically kept unchanged, by eliminating any averaging operation in the analysis phase and minimizing modification operations in the synthesis phase. The use of "kept-unchanged" F0 shapes has a great potential to incorporate a wide variety of speaking styles in the same framework, including not only read-out speech, but also dialogue and emotive speech. A linear-regression statistical model is proposed here to "manipulate" the stored raw F0 shapes for building them up to a sentential F0 contour. Through experimental evaluations, the proposed model turns out to provide a robust F0 contour prediction. By using the model, linguistically derived information of a sentence can be directly mapped, in a purely data-driven manner, to acoustic F0 values of the sentential intonation contour for a trained speaker.
Bibliographic reference. Saito, Takashi / Sakamoto, Masaharu (2001): "Generating F0 contours by statistical manipulation of natural F0 shapes", In EUROSPEECH-2001, 1171-1174.