In speech synthesis systems based on wave concatenation, using longer units can generate more natural synthetic speech. In order to improve the usage of longer units in the corpus, this paper proposed a hierarchical non-uniform unit selection framework. Each layer included in the framework is an independent searching procedure which searches for different sized units and adopts suitable naturalness measuring functions related to the unit type. We have applied it to our Mandarin speech synthesis system according to the Chinese prosodic structure with respect to the statistical result in our corpus. Experiment result shows it outperforms our previous system.
Bibliographic reference. Xu, Jun / Huang, Dezhi / Wang, Yongxin / Dong, Yuan / Cai, Lianhong / Wang, Haila (2007): "Hierarchical non-uniform unit selection based on prosodic structure", In INTERSPEECH-2007, 2861-2864.