This paper presents a speech segmentation scheme designed to be used in creating voice inventories for speech synthesis. Just the information about phoneme segments in a given speech corpus is not sufficient for speech synthesis, but multi-layers of segments such as breath groups, accent phrases, phonemes, and pitchmarks, are all necessary to reproduce the prosody and acoustics of a given speaker. The segmentation algorithm devised here has the capability of extracting the multi-layered segmental information in a distinctly organized fashion, and is fairly robust to speaker differences and speaking styles. The experimental evaluations with on speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases.
Cite as: Saito, T. (2005) A method of multi-layered speech segmentation tailored for speech synthesis. Proc. Interspeech 2005, 1153-1156, doi: 10.21437/Interspeech.2005-442
@inproceedings{saito05_interspeech, author={Takashi Saito}, title={{A method of multi-layered speech segmentation tailored for speech synthesis}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1153--1156}, doi={10.21437/Interspeech.2005-442} }