INTERSPEECH 2004 - ICSLP
To hear more natural synthetic speech generated by a Korean TTS (Text-To-Speech) system, we have to know all the possible prosodic rules in Korean language. We can extract these rules from linguistic, phonetic knowledge or by analyzing real speech. In general, all of these rules are integrated into a prosody-generation algorithm in TTS. But this algorithm cannot cover all the possible prosodic rules in one language and it is not perfect, so the quality of synthesized speech cannot be as good as we expect. So we propose artificial neural networks(ANNs) that can learn the prosodic rules in Korean language. Multi-Layer Perceptron(MLP) using an error Back Propagation(BP) algorithm had been selected as ANNs for this study. To train and test these ANNs, we made a corpus that consists of some meaningful sentences that were made from a corpus of phonetically balanced(PB) isolated words. These sentences were read by one male speaker, recorded, and collected as a speech database. We had analyzed recorded speech to extract prosodic information of each phoneme, and made target and test patterns for artificial neural networks. We found out that ANNs could learn the prosody from real speech and generate the prosody of a sentence when it was given to ANNs.
Bibliographic reference. Min, Kyung-Joong / Lim, Un-Cheon (2004): "Korean prosody generation and artificial neural networks", In INTERSPEECH-2004, 1869-1872.