First International Conference on Spoken Language Processing (ICSLP 90)
A Japanese text-to-speech conversion system has been developed, which can generate highly intelligible and natural synthetic speech from an arbitrary text written in Kanji characters (Chinese ideographs) by concatenating CV (C: consonant, V: vowel) and VC speech units. The system consists of a text analysis system and a speech synthesizer, constructed on compact hardware for a personal computer. To generate high quality synthetic speech, a pitch controlled residual wave excitation method is proposed, which uses residual waves as excitation signals for a synthesis filter in all portions of each speech unit. To realize natural rhythms, a phoneme duration rule has been created, based on statistical analysis of a large speech database. Evaluation experiments for the synthesizer were carried out. Results for the 100 syllable articulation test show an 88.8% accuracy rate and results for the 1,000 phonetically balanced word intelligibility test show a 97.4% accuracy.
Bibliographic reference. Iwata, Kazuhiko / Mitome, Yukio / Kametani, Jun / Akamatsu, Minoru / Tomotake, Seimitsu / Ozawa, Kazunori / Watanabe, Takao (1990): "A rule-based speech synthesizer using pitch controlled residual wave excitation method", In ICSLP-1990, 185-188.