ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Comparing spectral distance measures for join cost optimization in concatenative speech synthesis

Ingmund Bjørkan, Torbjørn Svendsen, Snorre Farner

In concatenative synthesis the join cost function can be related to the probability of a perceived discontinuity at the join. Therefore it is important that the distance measures in the cost function correlate highly with human perceived discontinuities. In this paper the results of a listening test on joins in two Norwegian long vowels: /A:/ and /e:/, is presented. Five spectral distance measures and the F0 difference are compared as predictors of the human perceived discontinuities using Receiver Operating Characteristic (ROC) curves. In addition, a linear join cost function is optimized by means of stepwise linear regression.


doi: 10.21437/Interspeech.2005-799

Cite as: Bjørkan, I., Svendsen, T., Farner, S. (2005) Comparing spectral distance measures for join cost optimization in concatenative speech synthesis. Proc. Interspeech 2005, 2577-2580, doi: 10.21437/Interspeech.2005-799

@inproceedings{bjrkan05_interspeech,
  author={Ingmund Bjørkan and Torbjørn Svendsen and Snorre Farner},
  title={{Comparing spectral distance measures for join cost optimization in concatenative speech synthesis}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={2577--2580},
  doi={10.21437/Interspeech.2005-799}
}