ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

A perceptual evaluation of distance measures for concatenative speech synthesis

Johan Wouters, Michael W. Macon

In concatenative synthesis, new utterances are created by concatenating segments (units) of recorded speech. When the segments are extracted from a large speech corpus, a key issue is to select segments that will sound natural in a given phonetic context. Distance measures are often used for this task. However, little is known about the perceptual relevance of these measures. More insight into the relationship between computed distances and perceptual differences is needed to develop accurate unit selection algorithms, and to improve the quality of the resulting computer speech. In this paper, we develop a perceptual test to measure subtle phonetic differences between speech units. We use the perceptual data to evaluate several popular distance measures. The results show that distance measures that use frequency warping perform better than those that do not, and minimal extra advantage is gained by using weighted distances or delta features.


doi: 10.21437/ICSLP.1998-51

Cite as: Wouters, J., Macon, M.W. (1998) A perceptual evaluation of distance measures for concatenative speech synthesis. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0905, doi: 10.21437/ICSLP.1998-51

@inproceedings{wouters98_icslp,
  author={Johan Wouters and Michael W. Macon},
  title={{A perceptual evaluation of distance measures for concatenative speech synthesis}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0905},
  doi={10.21437/ICSLP.1998-51}
}