INTERSPEECH 2006 - ICSLP
This paper proposes a constrained structural maximum a posteriori linear regression (CSMAPLR) algorithm for further improvement of speaker adaptation performance in HMM-based speech synthesis. In the algorithm, the concept of structural maximum a posteriori (SMAP) adaptation is applied to estimation of transformation matrices of the constrained MLLR (CMLLR), where recursive MAP-based estimation of the transformation matrices from the root node to lower nodes of context decision tree is conducted. We incorporate the algorithm into HSMM-based speech synthesis system and show that CSMAPLR adaptation utilizes both of the advantage of CMLLR and SMAPLR adaptation from the result of objective evaluation test. We also show that CSMAPLR adaptation provides more similar synthetic speech to the target speaker than CMLLR and SMAPLR adaptation from the result of subjective evaluation test.
Bibliographic reference. Nakano, Yuji / Tachibana, Makoto / Yamagishi, Junichi / Kobayashi, Takao (2006): "Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis", In INTERSPEECH-2006, paper 1784-Thu1BuP.10.