INTERSPEECH 2004  ICSLP

This paper describes an approach to controlling style of synthetic speech in HMMbased speech synthesis. The style is defined as one of speaking styles and emotional expressions in speech. We model each speech synthesis unit by using a contextdependent HMM whose mean vector of the output distribution function is given by a function of a parameter vector called style control vector. We assume that the mean vector is modeled by multiple regression with the style control vector. The multiple regression matrices are estimated by EMalgorithm as well as other model parameters of HMMs. In the synthesis stage, the mean vectors are modified by transforming an arbitrarily given control vector which is associated with a desired style. The results of subjective tests show that we can control styles by choosing the style control vector appropriately.
