This paper proposes a new method of stochastic modeling of prosodic contextual effects in dialog for the purpose of generating natural spoken language in dialog systems. F0 parameters are quantitatively predicted by two sets of rules; T-rule and D-rule. The T-rule is a conventional rule set for text-to-speech conversion. It models characteristics of isolated utterances without dialog context. The D-rule compensates the T-rule by modifying F0 parameters in accordance with the dialog context after application of the T-rule. The D-rule was inductively learned from the prediction error data which the T-rule resulted for the dialog utterances. The linear regressive method modeled the prediction errors with 7 dialog features manually annotated. The evaluation results of the D-rule are also shown for dialog utterances recorded under two different conditions.
Bibliographic reference. Yamashita, Y. / Mizoguchi, R. (1995): "Modeling the contextual effects on prosody in dialog", In EUROSPEECH-1995, 1329-1332.