Interspeech'2005 - Eurospeech
This paper presents a speech segmentation scheme designed to be used in creating voice inventories for speech synthesis. Just the information about phoneme segments in a given speech corpus is not sufficient for speech synthesis, but multi-layers of segments such as breath groups, accent phrases, phonemes, and pitchmarks, are all necessary to reproduce the prosody and acoustics of a given speaker. The segmentation algorithm devised here has the capability of extracting the multi-layered segmental information in a distinctly organized fashion, and is fairly robust to speaker differences and speaking styles. The experimental evaluations with on speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases.
Bibliographic reference. Saito, Takashi (2005): "A method of multi-layered speech segmentation tailored for speech synthesis", In INTERSPEECH-2005, 1153-1156.