Fourth ISCA ITRW on Speech Synthesis
August 29 - September 1, 2001
In concatenative Text-to-Speech, the size of the speech corpus is closely related to synthetic speech quality. In this paper, we describe our work on a new corpus-based Bell Labs' TTS system. This encompasses large acoustic inventories with a rich set of annotations, models and data structures for representing and managing such inventories, and an optimal unit selection algorithm that accomodates a broad range of possible cost criteria. We also propose a new method for setting weights in the cost functions based on a perceptual preference test. Our results show that this approach can successfully predict human preference pafferns. Synthetic speech using weights determined in this manner consistently demonstrates smoother transitions and higher voice quality than speech using manually set weights.
Bibliographic reference. Lee, Minkyu / Lopresti, Daniel P. / Olive, Joseph P. (2001): "A text-to-speech platform for variable length optimal unit searching using perceptual cost functions", In SSW4-2001, paper 122.