This paper proposes a context-dependent additive acoustic modelling technique and its application to logarithmic fundamental frequency (log F0) modelling for HMM-based speech synthesis. In the proposed technique, mean vectors of state-output distributions are composed as the weighted sum of decision tree-clustered context-dependent bias terms. Its model parameters and decision trees are estimated and built based on the maximum likelihood (ML) criterion. The proposed technique has the potential to capture the additive structure of log F0 contours. A preliminary experiment using a small database showed that the proposed technique yielded encouraging results.
Bibliographic reference. Zen, Heiga / Braunschweiler, Norbert (2009): "Context-dependent additive log f_0 model for HMM-based speech synthesis", In INTERSPEECH-2009, 2091-2094.