4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
In this paper, we propose a new model for synthesizing fundamental frequency (F0) contours using a stylization and a neural network learning method. The F0 contour is described as the superposition of 4 layered features; global tune, word pitch bias, lexical tone, and the syllabic pitch pattern. We firstly stylize the F0 contour of speech material, and analyze stylized data by statistical approach according to grammatical attributes. We then construct a melodic table, and train lexical tone with a neural network. Finally we develop the intonation generation rules for TTS conversion. This model produces a good neutral declarative intonation, and there is little difference between synthesized speech with original F0 contour and that with the rule generated contour when tested with our TD-PSOLA synthesizer.
Bibliographic reference. Lee, Jung-Chul / Lee, Youngjik / Kim, Sang-Hun / Hahn, Minsoo (1996): "Intonation processing for TTS using stylization and neural network learning method", In ICSLP-1996, 1381-1384.