5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

On The Use Of F0 Features In Automatic Segmentation For Speech Synthesis

Takashi Saito

Tokyo Research Laboratory, IBM Japan Ltd., Japan

This paper focuses on a method for automatically dividing speech utterances into phonemic segments, which are used for constructing synthesis unit inventories for speech synthesis. Here, we propose a new segmentation parameter called, "F0 dynamics (DF0)." In the fine structures of F0 contours, there exist phonemic events which are observed as local dips at phonemic transition regions, especially around voiced consonants. We apply this observation about F0 contours to a speech segmentation method. The DF0 segmentation parameter is used in the final stage of the segmentation procedure to refine the phonemic boundaries roughly obtained by DP alignment. We conduct experiments on the proposed automatic segmentation with a speech database prepared for unit inventory construction, and compare the obtained boundaries with those of manual segmentation to show the effectiveness of the proposed method. We also discuss the effects of the boundary refinement on the synthesized speech.

