ISCA Archive SSW 1990
ISCA Archive SSW 1990

A multi-language text-to-speech system using neural networks

Tatsuro Matsumoto, Yukiko Yamaguchi

In this paper, the design philosophies and performances of two components of our multi-language text-to-speech system are presented. A syntactic boundary neural network is trained with many five-word sequences and used to determine the boundaries existing before a middle word within a given word sequence. A letter-to-phoneme conversion neural network converts input letters to phonemes. To ensure reliability, we employed multiple networks and a unification layer. Results of performance evaluation for English show that the syntactic boundary neural network correctly located the syntactic boundaries with 96% accuracy (trained with 500 sentences, and tested with another 500 sentences), and that the letter-to-phoneme conversion neural network correctly converted letters to phonemes with 85% accuracy (trained with 1000 words, and tested with another 1000 words).


Cite as: Matsumoto, T., Yamaguchi, Y. (1990) A multi-language text-to-speech system using neural networks. Proc. First ESCA Workshop on Speech Synthesis (SSW 1), 269-272

@inproceedings{matsumoto90_ssw,
  author={Tatsuro Matsumoto and Yukiko Yamaguchi},
  title={{A multi-language text-to-speech system using neural networks}},
  year=1990,
  booktitle={Proc. First ESCA Workshop on Speech Synthesis (SSW 1)},
  pages={269--272}
}