EUROSPEECH 2003 - INTERSPEECH 2003
This paper presents a framework for custom-tailoring voice font in data-driven TTS systems. Three criteria for unit pruning, the prosodic outlier criterion, the importance criterion and the combination of the two, are proposed. The performance of voice fonts in different sizes which are pruned with the three criteria is evaluated by simulating speech synthesis over large amount of texts and estimating the naturalness with an objective measure at the same time. The result shows that the combined criterion performs the best among the three. The pre-estimated curve for naturalness vs. database size might be used as a reference for custom-tailoring voice font. The naturalness remains almost unchanged when 50% of instances are pruned off with the combined criterion.
Bibliographic reference. Zhao, Yong / Chu, Min / Peng, Hu / Chang, Eric (2003): "Custom-tailoring TTS voice font - keeping the naturalness when reducing database size", In EUROSPEECH-2003, 2957-2960.