The ESCA Workshop on Speech Synthesis
September 25-28, 1990
To achieve a concatenation-type Japanese text-to-speech system, we propose two basic procedures. The first is the use of phoneme segments with multiple tri-phone labels as the fundamental synthesis units. The multiple tri-phone labels equivalently increases the variation of the synthesis units. The second is a segment concatenation procedure taking account of feature parameter continuity at the segment junctions. A distortion at segment junction is introduced, which indicates how well synthesis units are combined. Natural and distinct speech is produced by the proposed procedures.
Bibliographic reference. Nomura, Tetsuya / Mizuno, Hideyuki / Sato, Hirokazu (1990): "Speech synthesis by optimum concatenation of phoneme segments", In SSW1-1990, 39-42.