INTERSPEECH 2006 - ICSLP
In this paper, we propose a novel approach to improve the performance of automatic speech segmentation techniques for concatenative text-to-speech synthesis. A number of automatic segmentation machines (ASMs) are simultaneously applied and the final boundary time marks are drawn from the multiple segmentation results. To identify the best time mark among those provided by the multiple ASMs, we apply a candidate selector trained over a set of manually-segmented speech database. The candidate selector defines a mapping from the phonetic boundary to the best ASM index which will output the time mark that may be closest to the manual segmentation result. The experimental results show that our approach dramatically improves the segmentation accuracy.
Bibliographic reference. Park, Seung Seop / Shin, Jong Won / Kim, Nam Soo (2006): "Automatic speech segmentation with multiple statistical models", In INTERSPEECH-2006, paper 1199-Wed3BuP.11.