Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

A V-CV Waveform Based Speech Synthesis Using Global Minimization of Pitch Conversion and Concatenation Distortion in V-CV Unit Sequence

Takao Koyama, Jun-ichi Takahashi

NTT Data Corporation, Laboratory for Information Technology. Tokyo, Japan

This paper proposes a new speech synthesis method for high-quality Japanese TTS (Text-to-speech) based on the waveform synthesis. The method uses V-CV as a basic synthesis unit to preserve the intelligibility of consonant. An efficient unit reconstruction method is newly adopted both to minimize pitch conversion and concatenation distortion when selecting waveforms. The minimization can provide fluency for synthesized speech. Furthermore, the proposed method enables to make a compact waveform dictionary keeping with high quality of synthesized speech. Using the waveform generation function of the method, the size of waveform dictionary can be drastically reduced by 1/40. Experimental evaluation using 32 ordinary peoples showed that high intelligibility of 97% was attained by the proposed V-CV speech synthesis method.

