Fourth ISCA ITRW on Speech Synthesis

August 29 - September 1, 2001
Perthshire, Scotland

A Text-to-Speech Platform for Variable Length Optimal Unit Searching Using Perceptual Cost Functions

Minkyu Lee, Daniel P. Lopresti, and Joseph P. Olive

Bell Labs, Lucent Technologies, Murray Hill, NJ, USA

In concatenative Text-to-Speech, the size of the speech corpus is closely related to synthetic speech quality. In this paper, we describe our work on a new corpus-based Bell Labs' TTS system. This encompasses large acoustic inventories with a rich set of annotations, models and data structures for representing and managing such inventories, and an optimal unit selection algorithm that accomodates a broad range of possible cost criteria. We also propose a new method for setting weights in the cost functions based on a perceptual preference test. Our results show that this approach can successfully predict human preference pafferns. Synthetic speech using weights determined in this manner consistently demonstrates smoother transitions and higher voice quality than speech using manually set weights.

Full Paper

Bibliographic reference.  Lee, Minkyu / Lopresti, Daniel P. / Olive, Joseph P. (2001): "A text-to-speech platform for variable length optimal unit searching using perceptual cost functions", In SSW4-2001, paper 122.