The ESCA Workshop on Speech Synthesis

September 25-28, 1990
Autrans, France

Speech Synthesis by Optimum Concatenation of Phoneme Segments

Tetsuya Nomura, Hideyuki Mizuno, Hirokazu Sato

NTT Human Interface Laboratories, Midori-Cho, Musashino-Shi, Tokyo, Japan

To achieve a concatenation-type Japanese text-to-speech system, we propose two basic procedures. The first is the use of phoneme segments with multiple tri-phone labels as the fundamental synthesis units. The multiple tri-phone labels equivalently increases the variation of the synthesis units. The second is a segment concatenation procedure taking account of feature parameter continuity at the segment junctions. A distortion at segment junction is introduced, which indicates how well synthesis units are combined. Natural and distinct speech is produced by the proposed procedures.

