5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

A Perceptual Evaluation of Distance Measures for Concatenative Speech Synthesis

Johan Wouters, Michael W. Macon

Center for Spoken Language Understanding, USA

In concatenative synthesis, new utterances are created by concatenating segments (units) of recorded speech. When the segments are extracted from a large speech corpus, a key issue is to select segments that will sound natural in a given phonetic context. Distance measures are often used for this task. However, little is known about the perceptual relevance of these measures. More insight into the relationship between computed distances and perceptual differences is needed to develop accurate unit selection algorithms, and to improve the quality of the resulting computer speech. In this paper, we develop a perceptual test to measure subtle phonetic differences between speech units. We use the perceptual data to evaluate several popular distance measures. The results show that distance measures that use frequency warping perform better than those that do not, and minimal extra advantage is gained by using weighted distances or delta features.

Full Paper

Bibliographic reference.  Wouters, Johan / Macon, Michael W. (1998): "A perceptual evaluation of distance measures for concatenative speech synthesis", In ICSLP-1998, paper 0905.