7th International Conference on Spoken Language Processing
September 16-20, 2002
For the purpose of building Text-to-Speech (TTS) system that can generate high-quality and wide range speech in prosody, we conducted speech database construction. As a speech synthesizer, we use a hybrid system which consists of a unit selection module and prosody modification by STRAIGHT (vocoder type high quality analysis- synthesis method). Our viewpoint is to reduce an amount of prosody modification which causes quality deterioration. In other words, it is to generate any prosody at will within permissible prosody modification. Based on the aspect, we designed 9 sub-databases those consist of same phonetic balanced texts with different prosody. In this paper, we describe the designing policy and general features of the obtained database and the results of listening tests focused on the effectiveness about durational feature. They shows the advantage of the proposed database. and but it is also observed the necessity to change unit selection cost function according to output speech rate.
Bibliographic reference. Kawanami, Hiromichi / Masuda, Tsuyoshi / Toda, Tomoki / Shikano, Kiyohiro (2002): "Designing Japanese speech database covering wide range in prosody for hybrid speech synthesizer", In ICSLP-2002, 2425-2428.