Sixth European Conference on Speech Communication and Technology
This paper proposes a new speech synthesis method for high-quality Japanese TTS (Text-to-speech) based on the waveform synthesis. The method uses V-CV as a basic synthesis unit to preserve the intelligibility of consonant. An efficient unit reconstruction method is newly adopted both to minimize pitch conversion and concatenation distortion when selecting waveforms. The minimization can provide fluency for synthesized speech. Furthermore, the proposed method enables to make a compact waveform dictionary keeping with high quality of synthesized speech. Using the waveform generation function of the method, the size of waveform dictionary can be drastically reduced by 1/40. Experimental evaluation using 32 ordinary peoples showed that high intelligibility of 97% was attained by the proposed V-CV speech synthesis method.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Koyama, Takao / Takahashi, Jun-ichi (1999): "A v-CV waveform based speech synthesis using global minimization of pitch conversion and concatenation distortion in v-CV unit sequence", In EUROSPEECH'99, 2311-2314.