Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Automatic Speech Segmentation with Multiple Statistical Models

Seung Seop Park, Jong Won Shin, Nam Soo Kim

Seoul National University, Korea

In this paper, we propose a novel approach to improve the performance of automatic speech segmentation techniques for concatenative text-to-speech synthesis. A number of automatic segmentation machines (ASMs) are simultaneously applied and the final boundary time marks are drawn from the multiple segmentation results. To identify the best time mark among those provided by the multiple ASMs, we apply a candidate selector trained over a set of manually-segmented speech database. The candidate selector defines a mapping from the phonetic boundary to the best ASM index which will output the time mark that may be closest to the manual segmentation result. The experimental results show that our approach dramatically improves the segmentation accuracy.

Full Paper

Bibliographic reference.  Park, Seung Seop / Shin, Jong Won / Kim, Nam Soo (2006): "Automatic speech segmentation with multiple statistical models", In INTERSPEECH-2006, paper 1199-Wed3BuP.11.