In this paper, the design philosophies and performances of two components of our multi-language text-to-speech system are presented. A syntactic boundary neural network is trained with many five-word sequences and used to determine the boundaries existing before a middle word within a given word sequence. A letter-to-phoneme conversion neural network converts input letters to phonemes. To ensure reliability, we employed multiple networks and a unification layer. Results of performance evaluation for English show that the syntactic boundary neural network correctly located the syntactic boundaries with 96% accuracy (trained with 500 sentences, and tested with another 500 sentences), and that the letter-to-phoneme conversion neural network correctly converted letters to phonemes with 85% accuracy (trained with 1000 words, and tested with another 1000 words).
Cite as: Matsumoto, T., Yamaguchi, Y. (1990) A multi-language text-to-speech system using neural networks. Proc. First ESCA Workshop on Speech Synthesis (SSW 1), 269-272
@inproceedings{matsumoto90_ssw, author={Tatsuro Matsumoto and Yukiko Yamaguchi}, title={{A multi-language text-to-speech system using neural networks}}, year=1990, booktitle={Proc. First ESCA Workshop on Speech Synthesis (SSW 1)}, pages={269--272} }