September 22-25, 1997
A corpus-based concatenative speech synthesis system using no signal processing can produce intelligible synthetic speech maintaining original voice characteristics, but it can sometimes be difficult to realize natural prosody. In such a concatenative system, it is very important to select appropriate waveform segments that are naturally close to the target prosody. This paper describes some approaches to unit selection for improving the prosody, especially intonation of such synthetic speech. If the unit selection measures for the fundamental frequency (F0) are insuficient, the concatenative system may produce speech having a discontinuous F0 pattern. Our proposed solution to this problem is to add extra measures for selecting units that form a smoother, more continuous F0 contour. Through subjective experiments, we confirmed that each of these measures effectively improved intonation naturalness.
Bibliographic reference. Fujisawa, Ken / Hirai, Toshio / Higuchi, Norio (1997): "Use of pitch pattern improvement in the CHATR speech synthesis system", In EUROSPEECH-1997, 2671-2674.