5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

A Linguistic and Prosodic Database for Data-Driven Japanese TTS Synthesis

Atsuhiro Sakurai, Takashi Natsume, Keikichi Hirose

Dep. of Information and Communication Engineering, The Univ. of Tokyo, Japan

We propose a method to generate a database that contains a parametric representation of F0 contours associated with linguistic and acoustic information, to be used by data-driven Japanese text-to-speech (TTS) systems. The configuration of the database includes recorded speech, F0 contours and their parametric labels, phonetic transcription with durations, and other linguistic information such as orthographic transcription, part-of-speech (POS) tags, and accent types. All information that is not available by dictionary lookup is obtained automatically. In this paper, we propose a method to automatically obtain parametric labels that describe F0 contours based on a superpositional model. Preliminary tests on a small data set show that the method can find the parametric representation of F0 contours with acceptable accuracy, and that accuracy can be improved by introducing additional linguistic information.

