7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Phonetic Normalization Using z-Score in Segmental Prosody Estimation for Corpus-Based TTS System

Hoeun Song (1), Jaein Kim (1), Kyongrok Lee (2), Jinyoung Kim (2)

(1) Korea Telecom, Korea; (2) Chonnam National University, Korea

Recently, corpus-based text-to-speech (CB-TTS) has been actively studied through the world. Statistical training methods are generally ap- plied for prosodic rules in CB-TTS, and classification and regression tree (CART) is one of the mostly used methods. In this paper, we present an efficient CART training approach of z-score based phonetic normalization. The idea of ours comes from the fact that the most important three parameters of CART training for segmental prosody are phone and its right and left phones, especially in Korean language. Our approach reduces the number of CART terminal nodes effectively. The reduction ratios are approximately 14-94% for estimation of segmental duration and 45-70% for intensity estimation. Also, the experimental results show that phonetic normalization slightly lessens the estimation errors.


Full Paper

Bibliographic reference.  Song, Hoeun / Kim, Jaein / Lee, Kyongrok / Kim, Jinyoung (2002): "Phonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system", In ICSLP-2002, 2393-2396.