7th International Conference on Spoken Language Processing
September 16-20, 2002
Currently, AT&T Labsí Natural Voices multilingual TTS system produces high-quality synthetic speech with a largescale speech corpus . In the development of such systems, automatic segmentation constitutes a major component technology. The prevalent approach for automatic segmentation in speech synthesis is Hidden Markov Model (HMM) - based. Even though an HMM- based approach is the most automatic and reliable, there are still several limitations, such as mismatches between hand-labeled transcriptions and HMM alignment labels which can lead to discontinuities in the synthetic speech, or the need for hand-labeled bootstrap data in HMM initialization. This paper introduces a new approach to automatic segmentation which aims both to minimize human intervention and to achieve a higher segmental quality of synthetic speech in unit-concatenative speech synthesis, by combining a conventional HMM-based approach and spectral boundary correction. A preference test demonstrates the proposed method is effective in reducing discontinuities in synthetic speech.
Bibliographic reference. Kim, Yeon-Jun / Conkie, Alistair (2002): "Automatic segmentation combining an HMM-based approach and spectral boundary correction", In ICSLP-2002, 145-148.