Generative Modeling of F0 Contours Leveraged by Phrase Structure and Its Application to Statistical Focus Control

Yuma Shirahata, Daisuke Saito, Nobuaki Minematsu


In this paper, we propose a statistical generative model of fundamental frequency (F0 ) contours that incorporates a phrase structure of Japanese (“bunsetsu”), and apply this model to control of the focus point in a sentence. Fujisaki model is a mathematical model that formulates F0 contours as the superposition of phrase and accent components, considering the control mechanism of vocal fold vibration. In the Fujisaki model, model parameters are closely related to linguistic information. Thus, flexible and interpretable conversion of F0 contours corresponding to linguistic information is achieved by changing the model parameters. Recently, a method of treating the Fujisaki model as a stochastic model has been proposed. In this method, the model parameters are inferred from observed F0 contours by a maximum likelihood manner. However, since there are no constraints of linguistic information in inference, unnatural parameters are occasionally estimated. In the proposed method, occurrence of phrase commands is linked to the boundaries of bunsetsu, and then the Fujisaki model parameters and phrase structure correspond to each other. It enables simultaneous modeling of two different F0 contours in every bunsetsu unit. The proposed modeling can be applied to pairs of neutral and focused utterances, and it enables bunsetsu-by-bunsetsu focus control . Experimental results show that the proposed method achieved reasonable control of focus in 74\% accuracy rate compared with natural speech. Though there is room for improvement in naturalness, the proposed scheme achieves interpretable conversion of prosody.


 DOI: 10.21437/SSW.2019-41

Cite as: Shirahata, Y., Saito, D., Minematsu, N. (2019) Generative Modeling of F0 Contours Leveraged by Phrase Structure and Its Application to Statistical Focus Control. Proc. 10th ISCA Speech Synthesis Workshop, 228-233, DOI: 10.21437/SSW.2019-41.


@inproceedings{Shirahata2019,
  author={Yuma Shirahata and Daisuke Saito and Nobuaki Minematsu},
  title={{Generative Modeling of F0 Contours Leveraged by Phrase Structure and Its Application to Statistical Focus Control}},
  year=2019,
  booktitle={Proc. 10th ISCA Speech Synthesis Workshop},
  pages={228--233},
  doi={10.21437/SSW.2019-41},
  url={http://dx.doi.org/10.21437/SSW.2019-41}
}