We propose a new approach to automatic speech segmentation for corpus-based speech synthesis. We utilizes multiple independent automatic segmentation machines (ASMs), instead of using a single ASM, to get final segmentation results: Given multiple independent time-marks from various ASMs, we remove biases of the time-marks, and then compute the weighted sum of the bias-removed time-marks. The bias and weight parameters needed for the proposed method are estimated for each phonetic context through a training procedure where manually-segmented results are used as the references. The bias parameters are obtained by averaging the corresponding errors. The weight parameters are simultaneously optimized through the gradient projection method to overcome a set of constraints in the weight parameter space. A decision tree is employed to deal with the unseen phonetic contexts. Experimental results show that the proposed method remarkably improves the segmentation accuracy.
Bibliographic reference. Park, Seung Seop / Shin, Jong Won / Kim, Jong Kyu / Kim, Nam Soo (2007): "A multiple-model based framework for automatic speech segmentation", In INTERSPEECH-2007, 82-85.