It has been proposed and dernostrated recently that neural networks, instead of conventional discrete rules, can be applied to speech synthesis from text. This paper concentrates primarily on a subtask of text-to-speech speech synthesis, namely the computation of phoneme durations by taking into account the complex contextual information of a phoneme. Possible network-based formulations and data representations are discussed in general and some experimental results are shown to demonstrate the feasibility of the approach. The results are promising and duration computation in the Finnish language according to preliminary experiments performs well compared to rule sets used so far. The more general problem of computing other control parameters of speech synthesis by neural networks is also dicussed shortly. Keywords: Speech synthesis by rule, Neural networks, Prosodic features of speech
Bibliographic reference. Karjalainen, Matti / Altosaar, Toomas (1991): "Phoneme duration rules for speech synthesis by neural networks", In EUROSPEECH-1991, 633-636.