8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

A Preselection Method Based on Cost Degradation from the Optimal Sequence for Concatenative Speech Synthesis

Nobuyuki Nishizawa, Hisashi Kawai

KDDI R&D Laboratories Inc., Japan

A novel unit preselection criterion for concatenative speech synthesis is proposed. To reduce the computational cost for unit selection, units that are unlikely to be selected should be pruned as preselection before Viterbi search. Since the criterion is defined as the difference between the cost of the locally optimal sequence where a unit is fixed and that of the globally optimal sequence, not only the target cost but also the concatenation cost can be taken into account in preselection. For real-time speech synthesis, a preselection method using decision trees, where a unit can be bound to multiple nodes of a tree, is also introduced. Results of a unit selection experiment show that the proposed method using decision trees built from 8-hour training data is superior in the costs of the selected units to the conventional online preselection based on target costs. The experimental results also show that the method is more effective where the computational cost is strongly limited.

