ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

A method of multi-layered speech segmentation tailored for speech synthesis

Takashi Saito

This paper presents a speech segmentation scheme designed to be used in creating voice inventories for speech synthesis. Just the information about phoneme segments in a given speech corpus is not sufficient for speech synthesis, but multi-layers of segments such as breath groups, accent phrases, phonemes, and pitchmarks, are all necessary to reproduce the prosody and acoustics of a given speaker. The segmentation algorithm devised here has the capability of extracting the multi-layered segmental information in a distinctly organized fashion, and is fairly robust to speaker differences and speaking styles. The experimental evaluations with on speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases.


doi: 10.21437/Interspeech.2005-442

Cite as: Saito, T. (2005) A method of multi-layered speech segmentation tailored for speech synthesis. Proc. Interspeech 2005, 1153-1156, doi: 10.21437/Interspeech.2005-442

@inproceedings{saito05_interspeech,
  author={Takashi Saito},
  title={{A method of multi-layered speech segmentation tailored for speech synthesis}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={1153--1156},
  doi={10.21437/Interspeech.2005-442}
}