ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Feature extraction for spectral continuity measures in concatenative speech synthesis

Barry Kirkpatrick, Darragh O’Brien, Ronán Scaife

The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity are difficult to define and standard measures often do not accurately reflect human perception of discontinuity across a concatenated join. In this study the performance of a number of standard distance measures are compared for the task of detecting audible discontinuities in concatenated speech. Feature sets derived from the phase spectrum are also investigated. Feature extraction based on wavelet analysis is proposed to overcome some of the limitations of the standard measures tested. Receiver Operating Characteristic (ROC) curves are constructed for each measure from the results of a perceptual experiment and are used to rank the performance of each measure. Results indicate that phase spectra is comparable to magnitude spectra as a join cost for spectral continuity. Measures based on wavelet transform coefficients outperform all other measures tested.


doi: 10.21437/Interspeech.2006-483

Cite as: Kirkpatrick, B., O’Brien, D., Scaife, R. (2006) Feature extraction for spectral continuity measures in concatenative speech synthesis. Proc. Interspeech 2006, paper 1385-Wed2A3O.1, doi: 10.21437/Interspeech.2006-483

@inproceedings{kirkpatrick06_interspeech,
  author={Barry Kirkpatrick and Darragh O’Brien and Ronán Scaife},
  title={{Feature extraction for spectral continuity measures in concatenative speech synthesis}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1385-Wed2A3O.1},
  doi={10.21437/Interspeech.2006-483}
}