7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Automatic Segmentation Combining an HMM-Based Approach and Spectral Boundary Correction

Yeon-Jun Kim, Alistair Conkie

AT&T Labs - Research, USA

Currently, AT&T Labsí Natural Voices multilingual TTS system produces high-quality synthetic speech with a largescale speech corpus [1]. In the development of such systems, automatic segmentation constitutes a major component technology. The prevalent approach for automatic segmentation in speech synthesis is Hidden Markov Model (HMM) - based. Even though an HMM- based approach is the most automatic and reliable, there are still several limitations, such as mismatches between hand-labeled transcriptions and HMM alignment labels which can lead to discontinuities in the synthetic speech, or the need for hand-labeled bootstrap data in HMM initialization. This paper introduces a new approach to automatic segmentation which aims both to minimize human intervention and to achieve a higher segmental quality of synthetic speech in unit-concatenative speech synthesis, by combining a conventional HMM-based approach and spectral boundary correction. A preference test demonstrates the proposed method is effective in reducing discontinuities in synthetic speech.


Full Paper

Bibliographic reference.  Kim, Yeon-Jun / Conkie, Alistair (2002): "Automatic segmentation combining an HMM-based approach and spectral boundary correction", In ICSLP-2002, 145-148.