7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Using Start/End Timings of Spectral Transitions Between Phonemes in Concatenative Speech Synthesis

Toshio Hirai (1), Seiichi Tenpaku (1), Kiyohiro Shikano (2)

(1) Arcadia Inc., Japan; (2) Nara Institute of Science and Technology, Japan

The definition of "phoneme boundary timing" in a speech corpus affects the quality of concatenative speech synthesis systems. For example, if the selected speech unit is not appropriately match to the speech unit of the required phoneme environment, the quality may be degraded. In this paper, a dynamic segment boundary defi- nition is proposed. In the definition, the concatenation point is chosen from the start or end timings of spectral transition depending on the phoneme environment at the boundaries. For a listening test to compare the naturalness of conventional/proposed methods, 100 Japanese place names were selected randomly and synthesized. The ratio of naturalness was 1 to 3.3 (conventional v.s. proposed) by four subjects.

Full Paper

Bibliographic reference.  Hirai, Toshio / Tenpaku, Seiichi / Shikano, Kiyohiro (2002): "Using start/end timings of spectral transitions between phonemes in concatenative speech synthesis", In ICSLP-2002, 2357-2360.