7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Designing Japanese Speech Database Covering Wide Range in Prosody for Hybrid Speech Synthesizer

Hiromichi Kawanami (1), Tsuyoshi Masuda (2), Tomoki Toda (1), Kiyohiro Shikano (1)

(1) Nara Institute of Science and Technology, Japan; (2) Asahi Kasei Corporation, Japan

For the purpose of building Text-to-Speech (TTS) system that can generate high-quality and wide range speech in prosody, we conducted speech database construction. As a speech synthesizer, we use a hybrid system which consists of a unit selection module and prosody modification by STRAIGHT (vocoder type high quality analysis- synthesis method). Our viewpoint is to reduce an amount of prosody modification which causes quality deterioration. In other words, it is to generate any prosody at will within permissible prosody modification. Based on the aspect, we designed 9 sub-databases those consist of same phonetic balanced texts with different prosody. In this paper, we describe the designing policy and general features of the obtained database and the results of listening tests focused on the effectiveness about durational feature. They shows the advantage of the proposed database. and but it is also observed the necessity to change unit selection cost function according to output speech rate.


Full Paper

Bibliographic reference.  Kawanami, Hiromichi / Masuda, Tsuyoshi / Toda, Tomoki / Shikano, Kiyohiro (2002): "Designing Japanese speech database covering wide range in prosody for hybrid speech synthesizer", In ICSLP-2002, 2425-2428.