To achieve high quality synthesised speech in HMM-based speech synthesis, the effective modelling of complex contexts is critical. Traditional approaches use context-dependent HMMs with decision tree based clustering to model the full contexts. However, weak contexts are difficult to capture using this approach. Context adaptive training provides a structured framework for this whereby standard HMMs represent normal contexts and linear transforms represent additional effect of weak-contexts. In contrast to speaker adaptive training, separate decision trees have to be built for the weak and normal context factors. This paper describes the general framework of context adaptive training and investigates three concrete forms: MLLR, CMLLR and CAT based systems. Experiments on a word-level emphasis synthesis task show that all context adaptive training approaches can outperform the standard full-context-dependent HMM approach. The MLLR based system achieved the best performance.
Bibliographic reference. Yu, Kai / Zen, Heiga / Mairesse, François / Young, Steve (2010): "Context adaptive training with factorized decision trees for HMM-based speech synthesis", In INTERSPEECH-2010, 414-417.