Interspeech'2005 - Eurospeech
In concatenative synthesis the join cost function can be related to the probability of a perceived discontinuity at the join. Therefore it is important that the distance measures in the cost function correlate highly with human perceived discontinuities. In this paper the results of a listening test on joins in two Norwegian long vowels: /A:/ and /e:/, is presented. Five spectral distance measures and the F0 difference are compared as predictors of the human perceived discontinuities using Receiver Operating Characteristic (ROC) curves. In addition, a linear join cost function is optimized by means of stepwise linear regression.
Bibliographic reference. Bjørkan, Ingmund / Svendsen, Torbjørn / Farner, Snorre (2005): "Comparing spectral distance measures for join cost optimization in concatenative speech synthesis", In INTERSPEECH-2005, 2577-2580.