This paper addresses an issue of prosodic variability of spontaneous speech in HMM-based spontaneous conversational speech synthesis. We propose an extended context set including information peculiar to spontaneous speech derived from the annotation data embedded in a large-scale database of spontaneous Japanese. We show the effectiveness of the newly introduced contexts from the results of objective and subjective evaluation experiments. We also propose stopping criteria for decision-tree clustering to alleviate an over-fitting problem. Experimental results show that the restriction of the size of each leaf node can improve the quality of synthetic speech.
Bibliographic reference. Koriyama, Tomoki / Nose, Takashi / Kobayashi, Takao (2011): "On the use of extended context for HMM-based spontaneous conversational speech synthesis", In INTERSPEECH-2011, 2657-2660.