4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Intonation Processing for TTS Using Stylization and Neural Network Learning Method

Jung-Chul Lee, Youngjik Lee, Sang-Hun Kim, Minsoo Hahn

Electronics and Telecommunications Research Institute, Taejon, Korea

In this paper, we propose a new model for synthesizing fundamental frequency (F0) contours using a stylization and a neural network learning method. The F0 contour is described as the superposition of 4 layered features; global tune, word pitch bias, lexical tone, and the syllabic pitch pattern. We firstly stylize the F0 contour of speech material, and analyze stylized data by statistical approach according to grammatical attributes. We then construct a melodic table, and train lexical tone with a neural network. Finally we develop the intonation generation rules for TTS conversion. This model produces a good neutral declarative intonation, and there is little difference between synthesized speech with original F0 contour and that with the rule generated contour when tested with our TD-PSOLA synthesizer[6][7].

Full Paper

Bibliographic reference.  Lee, Jung-Chul / Lee, Youngjik / Kim, Sang-Hun / Hahn, Minsoo (1996): "Intonation processing for TTS using stylization and neural network learning method", In ICSLP-1996, 1381-1384.