Customizing base unit set with speech database in TTS systems

Yining Chen, Yong Zhao, Min Chu

In unit selection based speech synthesizer, defining a good unit set is crucial to the speech quality. In this paper, a method of customizing the TTS base unit set with a specific speech corpus is proposed. Multi-phoneme units are boosted from the initial phoneme-sized unit. A new multi-phoneme unit is added to the inventory based upon its own frequency count and the affected frequency count of other units. As a result, a large base unit set, which contains many multi-phoneme units, is formed when the speech corpus is large. While, for a small speech corpus, only a few bi-phoneme or tri-phoneme are found. Such a scalable base unit set makes it possible to achieve better smoothness in concatenation while maintain the naturalness of prosody. Evaluation results show that, after replacing the phone-sized base unit set with the customized set, the search speed is improved by 5 times and 59% preference score is obtained.

doi: 10.21437/Interspeech.2005-795

