In this paper, statistical segmental duration modelling is proposed for English speech synthesis using Multiple Split Regression ( MSR ) and a hierarchical error function. To realize duration control by statistical method according to characteristics of English duration: interactions between control factors and hierarchical structure of timing, a suitable statistical modelling method is desired. MSR is a statistical modelling method which has data driven dynamic structure with combinatorial optimization technique. It incorporates both linear and tree regressions as special cases, and extends them. It can express phenomena of interaction between control factors for duration properly. The hierarchical predictive error function is adopted to analyze hierarchical structure of duration control in syllable and segmental levels. Experimental results show that MSR obtains higher values of multiple correlation than either linear or tree regressions with the same number of free parameters. Moreover, the error analysis by hierarchical predictive error function shows that interactions exist between factors at segmental and syllable levels in duration control, and that predictive errors at segmental duration are compensated in a syllable.
Bibliographic reference. Iwahashi, Naoto / Sagisaka, Yoshinori (1993): "Duration modelling with multiple split regression", In EUROSPEECH'93, 329-332.