Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Perceptually-Based Data-Driven Join Costs: Comparing Join Types

Ann K. Syrdal, Alistair D. Conkie

AT&T Labs Research, USA

Unit selection synthesis has improved the quality of synthetic speech by making it possible to concatenate speech from a large database to produce intelligible synthesis while preserving much of the naturalness of the original signal. Such synthesis is by no means perfect, however, and this paper describes work to achieve more optimal joins between concatenated units. Results from a psychoacoustic experiment, acoustic parameters and phonetic factors are analyzed and used in statistical training of join costs so that audible discontinuities at concatenation boundaries can be minimized.

Full Paper

Bibliographic reference.  Syrdal, Ann K. / Conkie, Alistair D. (2005): "Perceptually-based data-driven join costs: comparing join types", In INTERSPEECH-2005, 2813-2816.