In this paper we describe how unit selection for concatenative speech synthesis can be implemented efficiently for sub-phonetic units using weighted finite state transducers (WFST). We also introduce splicing costs as a measure to indicate which unit boundaries are particularly good or poor joint points. Splicing costs extend the flexibility offered by the unit selection paradigm. Through a perceptual experiment we demonstrate an improvement in speech quality achieved by using splicing costs during unit selection.
Cite as: Bulyko, I., Ostendorf, M. (2001) Unit selection for speech synthesis using splicing costs with weighted finite state transducers. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 987-990, doi: 10.21437/Eurospeech.2001-262
@inproceedings{bulyko01_eurospeech, author={Ivan Bulyko and Mari Ostendorf}, title={{Unit selection for speech synthesis using splicing costs with weighted finite state transducers}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={987--990}, doi={10.21437/Eurospeech.2001-262} }