Cost functions employed in unit selection significantly influence the quality of speech output. Although unit selection can produce very natural sounding speech the quality can be inconsistent and is difficult to guarantee due to discontinuities between incompatible units. The join cost employed in unit selection to measure the suitability of concatenating speech units typically consists of sub costs representing the fundamental frequency and spectrum at the boundaries of each unit. In this study the role of spectral dynamics as a join cost in unit selection synthesis is explored. A number of spectral dynamic measures are tested for the task of detecting discontinuities. Results indicate that spectral dynamic measures correlate with human perception of discontinuity if the features are extracted appropriately. Spectral dynamic mismatch is found to be a source of discontinuity although results suggest this is likely to occur simultaneously with static spectral mismatch.
Bibliographic reference. Kirkpatrick, Barry / O'Brien, Darragh / Scaife, Ronán / Errity, Andrew (2007): "On the role of spectral dynamics in unit selection speech synthesis", In INTERSPEECH-2007, 2889-2892.