We explore the effect of prosodic contextual factors for the HMM-based speech synthesis. In a baseline system, a lot of contextual factors are used during the model training, and the cost for parameter tying by context clustering become relatively high compared to that in the speech recognition. We examine the choice of prosodic contexts by objective measures for English and Japanese speech data. The experimental results show that more compact context sets gives also comparable or close performance to the conventional full context.
Bibliographic reference. Yokomizo, Shuji / Nose, Takashi / Kobayashi, Takao (2010): "Evaluation of prosodic contextual factors for HMM-based speech synthesis", In INTERSPEECH-2010, 430-433.