In unit selection based speech synthesizer, defining a good unit set is crucial to the speech quality. In this paper, a method of customizing the TTS base unit set with a specific speech corpus is proposed. Multi-phoneme units are boosted from the initial phoneme-sized unit. A new multi-phoneme unit is added to the inventory based upon its own frequency count and the affected frequency count of other units. As a result, a large base unit set, which contains many multi-phoneme units, is formed when the speech corpus is large. While, for a small speech corpus, only a few bi-phoneme or tri-phoneme are found. Such a scalable base unit set makes it possible to achieve better smoothness in concatenation while maintain the naturalness of prosody. Evaluation results show that, after replacing the phone-sized base unit set with the customized set, the search speed is improved by 5 times and 59% preference score is obtained.
Cite as: Chen, Y., Zhao, Y., Chu, M. (2005) Customizing base unit set with speech database in TTS systems. Proc. Interspeech 2005, 2561-2564, doi: 10.21437/Interspeech.2005-795
@inproceedings{chen05h_interspeech, author={Yining Chen and Yong Zhao and Min Chu}, title={{Customizing base unit set with speech database in TTS systems}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={2561--2564}, doi={10.21437/Interspeech.2005-795} }