8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Statistical Corpus-Based speech Segmentation

Vincent Pollet, Geert Coorman

Scansoft Belgium, Belgium

An automatic speech segmentation technique is presented that is based on the alignment of a target speech signal with a set of different reference speech signals generated by a specific designed corpus-based speech synthesis system that additionally generates phoneme boundary markers. Each reference signal is then warped to the target speech signal. By synthesizing and warping many different reference speech signals, each phoneme boundary of the target signal is characterized by a distribution of warped phoneme boundary positions. The boundary distributions are statistically and acoustically processed in order to generate the final segmentation. First, some problems related to manual and automatic phoneme segmentation are addressed. Then the technique of Statistical Corpus-based Segmentation (SCS) is introduced. Finally, intra- and inter-speaker segmentation results are presented.

