16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Combining Extreme Learning Machine and Decision Tree for Duration Prediction in HMM Based Speech Synthesis

Yang Wang, Minghao Yang, Zhengqi Wen, Jianhua Tao

Chinese Academy of Sciences, China

Hidden Markov Model (HMM) based speech synthesis using Decision Tree (DT) for duration prediction is known to produce over-averaged rhythm. To alleviate this problem, this paper proposes a two level duration prediction method together with outlier removal. This method takes advantages of accurate regression capability by Extreme Learning Machine (ELM) for phone level duration prediction, and the capability of distributing state durations by DT for state level duration prediction. Experimental results showed that the method decreased RMSE of phone duration, increased the fluctuation of syllable duration, and achieved 63.75% in preference evaluation. Furthermore, this method does not incur laborious manual alignment on training corpus.

Full Paper     Acoustic Examples

Bibliographic reference.  Wang, Yang / Yang, Minghao / Wen, Zhengqi / Tao, Jianhua (2015): "Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis", In INTERSPEECH-2015, 2197-2201.