9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Investigating Festival's Target Cost Function Using Perceptual Experiments

Volker Strom, Simon King

University of Edinburgh, UK

We describe an investigation of the target cost used in the Festival unit selection speech synthesis system [1]. Our ultimate goal is to automatically learn a perceptually optimal target cost function. In this study, we investigated the behaviour of the target cost for one segment type. The target cost is based on counting the mismatches in several context features. A carrier sentence ("My name is Roger") was synthesised using all 147,820 possible combinations of the diphones /n_ei/ and /ei_m/. 92 representative versions were selected and presented to listeners as 460 pairwise comparisons. The listeners' preference votes were used to analyse the behaviour of the target cost, with respect to the values of its component linguistic context features.

Full Paper

Bibliographic reference.  Strom, Volker / King, Simon (2008): "Investigating festival's target cost function using perceptual experiments", In INTERSPEECH-2008, 1873-1876.