Fifth ISCA ITRW on Speech Synthesis
June 14-16, 2004
We present a new approach to solve the problem of phone segmentation when preparing databases for concatenative Text-to-Speech synthesis. First, we describe the problem and review the state of the art. Then we present some already existing techniques to perform this segmentation and present our approach based on a Regression Tree to perform Boundary Specific Correction of the HMM segmentation. We discus different evaluation procedures. Finally, we compare some systems and we show how our system improves the system based on HMMs setting 94% of the boundaries within a tolerance of 20ms compared to a manual segmentation, and how phonetic rather than acoustical features are better suited for this task.
Bibliographic reference. Adell, Jordi / Bonafonte, Antonio (2004): "Towards phone segmentation for concatenative speech synthesis", In SSW5-2004, 139-144.