We describe an investigation of the target cost used in the Festival unit selection speech synthesis system . Our ultimate goal is to automatically learn a perceptually optimal target cost function. In this study, we investigated the behaviour of the target cost for one segment type. The target cost is based on counting the mismatches in several context features. A carrier sentence ("My name is Roger") was synthesised using all 147,820 possible combinations of the diphones /n_ei/ and /ei_m/. 92 representative versions were selected and presented to listeners as 460 pairwise comparisons. The listeners' preference votes were used to analyse the behaviour of the target cost, with respect to the values of its component linguistic context features.
Bibliographic reference. Strom, Volker / King, Simon (2008): "Investigating festival's target cost function using perceptual experiments", In INTERSPEECH-2008, 1873-1876.