ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Customizing base unit set with speech database in TTS systems

Yining Chen, Yong Zhao, Min Chu

In unit selection based speech synthesizer, defining a good unit set is crucial to the speech quality. In this paper, a method of customizing the TTS base unit set with a specific speech corpus is proposed. Multi-phoneme units are boosted from the initial phoneme-sized unit. A new multi-phoneme unit is added to the inventory based upon its own frequency count and the affected frequency count of other units. As a result, a large base unit set, which contains many multi-phoneme units, is formed when the speech corpus is large. While, for a small speech corpus, only a few bi-phoneme or tri-phoneme are found. Such a scalable base unit set makes it possible to achieve better smoothness in concatenation while maintain the naturalness of prosody. Evaluation results show that, after replacing the phone-sized base unit set with the customized set, the search speed is improved by 5 times and 59% preference score is obtained.

doi: 10.21437/Interspeech.2005-795

Cite as: Chen, Y., Zhao, Y., Chu, M. (2005) Customizing base unit set with speech database in TTS systems. Proc. Interspeech 2005, 2561-2564, doi: 10.21437/Interspeech.2005-795

  author={Yining Chen and Yong Zhao and Min Chu},
  title={{Customizing base unit set with speech database in TTS systems}},
  booktitle={Proc. Interspeech 2005},