Fifth ISCA ITRW on Speech Synthesis
June 14-16, 2004
We describe a corpus-based approach to improving synthesized speech quality and present two useful cost functions for unit selection. One is pitch-synchronous cross correlation for concatenation costs to reduce the noise caused by phase mismatch at concatenation points. The other is a discontinuous cost function for internal and concatenation costs to eliminate unnecessary cost calculation. An evaluation showed that incorporating pitchsynchronous cross correlation cost was better than using a conventional cost function. In addition, an opinion test to assess the naturalness of the synthesized speech indicated that the proposed method was 0.7 points better on a sevenpoint MOS (Mean of Opinion Score) than the conventional system. This paper also discusses other improvements in the performance of text-to-speech systems. In this session, we will demonstrate our Japanese text-to-speech system.
Bibliographic reference. Nukaga, Nobuo / Kamoshida, Ryota / Nagamatsu, Kenji (2004): "Unit selection using pitch synchronous cross correlation for Japanese concatenative speech synthesis", In SSW5-2004, 43-48.