7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Objective Distance Measures for Spectral Discontinuities in Concatenative Speech Synthesis

Jithendra Vepa (1), Simon King (1), Paul Taylor (2)

(1) University of Edinburgh, U.K.; (2) Rhetorical Systems, U.K.

In unit selection based concatenative speech systems, join Cost, which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. The ideal join cost will measure perceived discontinuity, based on easily measurable spectral properties of the units being joined, in order to ensure smooth and natural-sounding synthetic speech. In this paper we report a perceptual experiment conducted to measure the correlation between subjective human perception and various objective spectrally-based measures proposed in the literature. Our experiments used a state-of-the art unit-selection text-to-speech system: rVoice from Rhetorical Systems Ltd.


Full Paper

Bibliographic reference.  Vepa, Jithendra / King, Simon / Taylor, Paul (2002): "Objective distance measures for spectral discontinuities in concatenative speech synthesis", In ICSLP-2002, 2605-2608.