7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper presents an extension to a previous work , which used an imitation speech database and a prosodic unit selection algorithm, for improving the naturalness of synthesized speech. The basic approach of the system is to combine a rule-generated prosody with a corpus based prosody module, trying to retain both the robustness of the rule prosody, and the naturalness of the human recorded speech units. This combination was achieved by using a database of imitation speech, enabling a higher level of annotation, which is used by a dynamic unit selection algorithm. Although listeners have been shown to prefer the prosody generated with this method over that of the original rule generated prosody, the usual problems related to selection from an undersized training corpus were occasionally present.
Instead of increasing the size of the training database, a different solution is investigated here, which is to perform a controlled fallback to the rule prosody, but in a way which is compatible with the unit selection approach. The suggested method has a minimal effect on the required memory size and the amount of computation, and was shown to produce favorable results.
Bibliographic reference. Meron, Joram (2002): "Applying fallback to prosodic unit selection from a small imitation database", In ICSLP-2002, 2093-2096.