8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Number of Output Nodes of Artificial Neural Networks for Korean Prosody Generation

Kyung-Joong Min (1), Chan-Goo Kang (2), Un-Cheon Lim (1)

(1) Hoseo University, Korea
(2) Anyang Technical College, Korea

We'd been studying artificial neural networks(ANNs) that can learn and generate the prosody of a Korean sentence. To hear more natural synthetic speech generated by a Korean TTS (Text-To-Speech) system, we have to know all the possible prosodic rules about Korean language and integrate all of these rules into an algorithm. We can get these rules from linguistic, phonetic knowledge or by analyzing real speech. But this algorithm cannot cover all the possible prosodic rules in a language, so the quality of synthesized speech cannot be as good as we expect. We had trained BP (Back Propagation) ANNs that can learn the energy contour and the pitch contour of a phoneme in a sentence and generate the polynomial parameters of the contours that can be used in TTS system. The prosodic contours of a phoneme can be approximated by polynomial equations and the order of the polynomial equations can be determined according to the various conditions. In this paper, we had compared the performances of ANNs with different number of output nodes.

Full Paper

Bibliographic reference.  Min, Kyung-Joong / Kang, Chan-Goo / Lim, Un-Cheon (2004): "Number of output nodes of artificial neural networks for Korean prosody generation", In INTERSPEECH-2004, 1901-1904.