5th International Conference on Spoken Language Processing
This paper provides a new method for automatically generating speech synthesis units. The algorithm, called Closed-Loop Training (CLT), is based on evaluating and reducing the distortion in synthesized speech. It minimizes distortion caused by synthesis process such as prosodic modification in an analytic way. The distortion is measured by calculating the error between synthesized speech units and natural speech units in a large speech database (corpus). The CLT method effectively generates the synthesis units that are most resembling of natural speech after synthesis process. In this paper, CLT is applied to a waveform concatenation based synthesizer, whose basic unit is a diphone. By using CLT, the synthesizer generates clear and smooth synthetic speech even with a relatively small volume of synthesis units.
Bibliographic reference. Akamine, Masami / Kagoshima, Takehiko (1998): "Analytic generation of synthesis units by closed loop training for totally speaker driven text to speech system (TOS drive TTS)", In ICSLP-1998, paper 0139.