A method was developed for generating sentence F0 contours of Japanese, when a focus is placed in one of the “bunsetsu” of an utterance. It controls F0 based on the F0 model; not frame-byframe F0 prediction as in the case of HMM-based speech synthesis. The method first predicts differences in the F0 model commands between utterances with and without focus, and then applies them to the F0 model commands predicted beforehand by the baseline method without focus assignment. The baseline method is trained using a large corpus, while corpus for training command differences can be small and not necessarily be uttered by the same speaker of the large corpus. The validity of the method was proved by the experiment on F0 contour generation and speech synthesis, including interpolation/extrapolation of the F0 model commands for focus level control.
Index Terms: Generation process model, F0 contour, Corpusbased method, Speech synthesis, Prosodic focus
Cite as: Ochi, K., Hirose, K., Minematsu, N. (2010) Realization of prosodic focuses in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model. Proc. Speech Prosody 2010, paper 880
@inproceedings{ochi10_speechprosody, author={Keiko Ochi and Keikichi Hirose and Nobuaki Minematsu}, title={{Realization of prosodic focuses in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model}}, year=2010, booktitle={Proc. Speech Prosody 2010}, pages={paper 880} }