Hidden Markov Model (HMM) based speech synthesis using Decision Tree (DT) for duration prediction is known to produce over-averaged rhythm. To alleviate this problem, this paper proposes a two level duration prediction method together with outlier removal. This method takes advantages of accurate regression capability by Extreme Learning Machine (ELM) for phone level duration prediction, and the capability of distributing state durations by DT for state level duration prediction. Experimental results showed that the method decreased RMSE of phone duration, increased the fluctuation of syllable duration, and achieved 63.75% in preference evaluation. Furthermore, this method does not incur laborious manual alignment on training corpus.
Full Paper Acoustic Examples
Bibliographic reference. Wang, Yang / Yang, Minghao / Wen, Zhengqi / Tao, Jianhua (2015): "Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis", In INTERSPEECH-2015, 2197-2201.