ISCA Archive SSW 2004
ISCA Archive SSW 2004

Unit selection using pitch synchronous cross correlation for Japanese concatenative speech synthesis

Nobuo Nukaga, Ryota Kamoshida, Kenji Nagamatsu

We describe a corpus-based approach to improving synthesized speech quality and present two useful cost functions for unit selection. One is pitch-synchronous cross correlation for concatenation costs to reduce the noise caused by phase mismatch at concatenation points. The other is a discontinuous cost function for internal and concatenation costs to eliminate unnecessary cost calculation. An evaluation showed that incorporating pitchsynchronous cross correlation cost was better than using a conventional cost function. In addition, an opinion test to assess the naturalness of the synthesized speech indicated that the proposed method was 0.7 points better on a sevenpoint MOS (Mean of Opinion Score) than the conventional system. This paper also discusses other improvements in the performance of text-to-speech systems. In this session, we will demonstrate our Japanese text-to-speech system.


Cite as: Nukaga, N., Kamoshida, R., Nagamatsu, K. (2004) Unit selection using pitch synchronous cross correlation for Japanese concatenative speech synthesis. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 43-48

@inproceedings{nukaga04_ssw,
  author={Nobuo Nukaga and Ryota Kamoshida and Kenji Nagamatsu},
  title={{Unit selection using pitch synchronous cross correlation for Japanese concatenative speech synthesis}},
  year=2004,
  booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)},
  pages={43--48}
}