Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Unit Selection Using Pitch Synchronous Cross Correlation for Japanese Concatenative Speech Synthesis

Nobuo Nukaga, Ryota Kamoshida, Kenji Nagamatsu

Hitachi Ltd., Central Research Laboratory, Japan

We describe a corpus-based approach to improving synthesized speech quality and present two useful cost functions for unit selection. One is pitch-synchronous cross correlation for concatenation costs to reduce the noise caused by phase mismatch at concatenation points. The other is a discontinuous cost function for internal and concatenation costs to eliminate unnecessary cost calculation. An evaluation showed that incorporating pitchsynchronous cross correlation cost was better than using a conventional cost function. In addition, an opinion test to assess the naturalness of the synthesized speech indicated that the proposed method was 0.7 points better on a sevenpoint MOS (Mean of Opinion Score) than the conventional system. This paper also discusses other improvements in the performance of text-to-speech systems. In this session, we will demonstrate our Japanese text-to-speech system.

