5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

On the Global FO Shape Model using a Transition Network for Japanese Text-to-Speech Systems

Yasushi Ishikawa, Takashi Ebihara

Information Technology R&D Center, Mitsubishi Electric Corporation, Ofuna, Kamakura, Kanagawa, Japan

In this paper, we describe a model of fundamental frequency control. In general, a two stage model which consists of a global model and a local model is used as a FO control method for Japanese text-to-speech systems. We propose a model which is represented by transition network as a global model that generates parameters of a local pitch model from linguistic parameters of a sentence. In the proposed model, syntactic analysis and generation of FO parameters are integrated, and the nodes of a network represent the linguistic and prosodic state of a sentence. The parameters of a local model is generated when taking transition. We also propose a training method of the network. The prediction results showed our model can predict the phrasal accent parameters with satisfactory high accuracy. We also describe the model can be applied prediction of pause position.

