Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Comparing Spectral Distance Measures for Join Cost Optimization in Concatenative Speech Synthesis

Ingmund Bjørkan, Torbjørn Svendsen, Snorre Farner

Norwegian University of Science & Technology, Norway

In concatenative synthesis the join cost function can be related to the probability of a perceived discontinuity at the join. Therefore it is important that the distance measures in the cost function correlate highly with human perceived discontinuities. In this paper the results of a listening test on joins in two Norwegian long vowels: /A:/ and /e:/, is presented. Five spectral distance measures and the F0 difference are compared as predictors of the human perceived discontinuities using Receiver Operating Characteristic (ROC) curves. In addition, a linear join cost function is optimized by means of stepwise linear regression.

