16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Speaker Adaptation Using Relevance Vector Regression for HMM-Based Expressive TTS

Doo Hwa Hong, Joun Yeop Lee, Se Young Jang, Nam Soo Kim

Seoul National University, Korea

The conventional maximum likelihood linear regression (MLLR)-based adaptation algorithm employed to acoustic hidden Markov models (HMMs) is too restricted in linear regression to represent the details of mapping charateristics. To overcome this problem, we propose the relevance vector regression (RVR)-based model parameter adaptation technique. In this framework, the conventional technique is extended to have much more basis functions. Also, the weights for conducting a transform matrix are obtained by sparse Bayesian learning, in which most of the weights become zero due to the definition of the prior with the precision hyper-parameters. Furthermore, by using the appropriate kernel functions, RVR can take both of the advantages of linear and nonlinear regression. In the experiments, the emotional speech database is used for adaptation to evaluate the proposed method compared with the conventional constrained MLLR. From the experimental results, we conclude that the RVR adaption method performs better than the conventional method.

Full Paper

Bibliographic reference.  Hong, Doo Hwa / Lee, Joun Yeop / Jang, Se Young / Kim, Nam Soo (2015): "Speaker adaptation using relevance vector regression for HMM-based expressive TTS", In INTERSPEECH-2015, 1216-1220.