Unit selection text-to-speech synthesis relies on multiple cost criteria, each encapsulating a different aspect of acoustic and prosodic context at any given concatenation point. For a particular set of criteria, the relative weighting of the resulting costs crucially affects final candidate ranking. Their influence is typically determined in an empirical manner (e.g., based on a limited amount of synthesized data), yielding global weights that are thus applied to all concatenations indiscriminately. This paper proposes an alternative approach, based on a data-driven framework separately optimized for each concatenation. The cost distribution in every information stream is dynamically leveraged to locally shift weight towards those characteristics that prove most discriminative at this point. An illustrative case study underscores the potential benefits of this solution.
Bibliographic reference. Bellegarda, Jerome R. (2009): "A novel approach to cost weighting in unit selection TTS", In INTERSPEECH-2009, 744-747.