September 22-25, 1997
In this paper, we describe a model of fundamental frequency control. In general, a two stage model which consists of a global model and a local model is used as a FO control method for Japanese text-to-speech systems. We propose a model which is represented by transition network as a global model that generates parameters of a local pitch model from linguistic parameters of a sentence. In the proposed model, syntactic analysis and generation of FO parameters are integrated, and the nodes of a network represent the linguistic and prosodic state of a sentence. The parameters of a local model is generated when taking transition. We also propose a training method of the network. The prediction results showed our model can predict the phrasal accent parameters with satisfactory high accuracy. We also describe the model can be applied prediction of pause position.
Bibliographic reference. Ishikawa, Yasushi / Ebihara, Takashi (1997): "On the global FO shape model using a transition network for Japanese text-to-speech systems", In EUROSPEECH-1997, 2679-2682.