ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Investigating festival's target cost function using perceptual experiments

Volker Strom, Simon King

We describe an investigation of the target cost used in the Festival unit selection speech synthesis system [1]. Our ultimate goal is to automatically learn a perceptually optimal target cost function. In this study, we investigated the behaviour of the target cost for one segment type. The target cost is based on counting the mismatches in several context features. A carrier sentence ("My name is Roger") was synthesised using all 147,820 possible combinations of the diphones /n_ei/ and /ei_m/. 92 representative versions were selected and presented to listeners as 460 pairwise comparisons. The listeners' preference votes were used to analyse the behaviour of the target cost, with respect to the values of its component linguistic context features.


doi: 10.21437/Interspeech.2008-187

Cite as: Strom, V., King, S. (2008) Investigating festival's target cost function using perceptual experiments. Proc. Interspeech 2008, 1873-1876, doi: 10.21437/Interspeech.2008-187

@inproceedings{strom08_interspeech,
  author={Volker Strom and Simon King},
  title={{Investigating festival's target cost function using perceptual experiments}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={1873--1876},
  doi={10.21437/Interspeech.2008-187}
}