ISCA Archive SSW 2004
ISCA Archive SSW 2004

A concatenative speech synthesis method using context dependent phoneme sequences with variable length as search units

Hiroyuki Segi, Tohru Takagi, Takayuki Ito

This paper proposes a new concatenative speech synthesis method using context dependent phoneme sequences with variable length as search units. Using Japanese broadcast news programs as a speech database, we synthesize Japanese news sentences that are not included in that speech database and perform subjective evaluations of the synthesized speech. As a result, (1) 77% of speech synthesized by the proposed method was preferred to that by the conventional method, (2) speech synthesis runtime was reduced to one-tenth that of the conventional method, (3) the mean opinion score (MOS) was 3.94 in a five point MOS test, and 37% of synthesized speech had the same naturalness as natural speech, and (4) speech synthesis runtime was only slightly increased despite the larger speech database. The results show the effectiveness of the proposed method.


Cite as: Segi, H., Takagi, T., Ito, T. (2004) A concatenative speech synthesis method using context dependent phoneme sequences with variable length as search units. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 115-120

@inproceedings{segi04_ssw,
  author={Hiroyuki Segi and Tohru Takagi and Takayuki Ito},
  title={{A concatenative speech synthesis method using context dependent phoneme sequences with variable length as search units}},
  year=2004,
  booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)},
  pages={115--120}
}