ISCA Archive SSW 1998
ISCA Archive SSW 1998

Prosody-based unit selection for Japanese speech synthesis

Ken Fujisawa, Nick Campbell

A corpus-based concatenative speech synthesis system using no signal processing can produce intelligible synthetic speech maintaining original voice characteristics. In such a concatenative system, it is very important to select appropriate waveform segments that are naturally close to the target prosody. But with a limited size database it can sometimes be difficult to realize natural prosody.

This paper describes an approach to unit (waveform segment) selection for improving the intonation. We analyzed the pitch patterns of 503 sentences of read speech spoken by a Japanese female and obtained the F0 range of natural prosody. Then we applied this restriction to the unit selection of the concatenative speech synthesizer. Through subjective experiments, we confirmed that this measure significantly improved the intonational naturalness of synthetic speech.


Cite as: Fujisawa, K., Campbell, N. (1998) Prosody-based unit selection for Japanese speech synthesis. Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3), 181-184

@inproceedings{fujisawa98_ssw,
  author={Ken Fujisawa and Nick Campbell},
  title={{Prosody-based unit selection for Japanese speech synthesis}},
  year=1998,
  booktitle={Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3)},
  pages={181--184}
}