Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Towards Phone Segmentation for Concatenative Speech Synthesis

Jordi Adell, Antonio Bonafonte

Dept. of Signal Theory and Comunication, TALP Research Center, Technical University of Catalonia (UPC), Barcelona, Spain

We present a new approach to solve the problem of phone segmentation when preparing databases for concatenative Text-to-Speech synthesis. First, we describe the problem and review the state of the art. Then we present some already existing techniques to perform this segmentation and present our approach based on a Regression Tree to perform Boundary Specific Correction of the HMM segmentation. We discus different evaluation procedures. Finally, we compare some systems and we show how our system improves the system based on HMMs setting 94% of the boundaries within a tolerance of 20ms compared to a manual segmentation, and how phonetic rather than acoustical features are better suited for this task.

