Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

A Concatenative Speech Synthesis Method Using Context Dependent Phoneme Sequences with Variable Length as Search Units

Hiroyuki Segi, Tohru Takagi, Takayuki Ito

NHK (Nippon Hoso Kyokai; Japan Broadcasting Corp.) Science and Technical Research Laboratories, Japan

This paper proposes a new concatenative speech synthesis method using context dependent phoneme sequences with variable length as search units. Using Japanese broadcast news programs as a speech database, we synthesize Japanese news sentences that are not included in that speech database and perform subjective evaluations of the synthesized speech. As a result, (1) 77% of speech synthesized by the proposed method was preferred to that by the conventional method, (2) speech synthesis runtime was reduced to one-tenth that of the conventional method, (3) the mean opinion score (MOS) was 3.94 in a five point MOS test, and 37% of synthesized speech had the same naturalness as natural speech, and (4) speech synthesis runtime was only slightly increased despite the larger speech database. The results show the effectiveness of the proposed method.

Full Paper

Bibliographic reference.  Segi, Hiroyuki / Takagi, Tohru / Ito, Takayuki (2004): "A concatenative speech synthesis method using context dependent phoneme sequences with variable length as search units", In SSW5-2004, 115-120.