Fifth ISCA ITRW on Speech Synthesis
June 14-16, 2004
This paper proposes a new concatenative speech synthesis method using context dependent phoneme sequences with variable length as search units. Using Japanese broadcast news programs as a speech database, we synthesize Japanese news sentences that are not included in that speech database and perform subjective evaluations of the synthesized speech. As a result, (1) 77% of speech synthesized by the proposed method was preferred to that by the conventional method, (2) speech synthesis runtime was reduced to one-tenth that of the conventional method, (3) the mean opinion score (MOS) was 3.94 in a five point MOS test, and 37% of synthesized speech had the same naturalness as natural speech, and (4) speech synthesis runtime was only slightly increased despite the larger speech database. The results show the effectiveness of the proposed method.
Bibliographic reference. Segi, Hiroyuki / Takagi, Tohru / Ito, Takayuki (2004): "A concatenative speech synthesis method using context dependent phoneme sequences with variable length as search units", In SSW5-2004, 115-120.