Concatenated waveform text-to-speech synthesis systems require an inventory of stored waveforms from which units of speech can be extracted for subsequent rearrangement and concatenation as needed. In previous papers [1], [2] we have argued that for natural sounding speech the syllable should be the preferred unit. The mark-up of the stored waveforms for segmentation into syllables must be precise and for our MeteoSPRUCE limited domain system the mark-up has been done by manual editing. In this paper we describe how most of the segmentation can be done automatically, leaving only those waveforms which would be prone to error to be segmented manually. With automatic labelling of both the pitch periods and the syllables the task of generating different synthetic voices to order becomes feasible.
Cite as: Lewis, E., Tatham, M. (2001) Automatic segmentation of recorded speech into syllables for speech synthesis. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1703-1706, doi: 10.21437/Eurospeech.2001-399
@inproceedings{lewis01_eurospeech, author={Eric Lewis and Mark Tatham}, title={{Automatic segmentation of recorded speech into syllables for speech synthesis}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={1703--1706}, doi={10.21437/Eurospeech.2001-399} }