7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper describes a new technique for constructing a decision tree used for clustering average voice model, i.e., speaker independent speech units. In the technique, we first train speaker dependent models using multi-speaker speech database, and then construct a speaker independent decision tree for context clustering common to these speaker dependent models. When a node of the decision tree is split, only the context related questions which can split the node for all speaker dependent models is adopted. Consequently, all nodes of the decision tree have all speakersí training data. From the result of the paired comparison test, we show that the average voice model trained using the proposed technique can synthesize more natural sounding speech than the conventional average voice model.
Bibliographic reference. Yamagishi, Junichi / Tamura, Masatsune / Masuko, Takashi / Tokuda, Keiichi / Kobayashi, Takao (2002): "A context clustering technique for average voice model in HMM-based speech synthesis", In ICSLP-2002, 133-136.