7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Design of a Mandarin Sentence Set for Corpus-Based Speech Synthesis by Use of a Multi-Tier Algorithm Taking Account of the Varied Prosodic and Spectral Characteristics

Jinfu Ni, Hisashi Kawai

ATR Spoken Language Translation Research Laboratories, Japan

This paper presents a multi-tier algorithm to extract a sentence set from a large raw text corpus for synthesis of Mandarin speech, taking account of varied prosodic and spectral characteristics. The prosodic and spectral characteristics are statistically analyzed from the text corpus and transcribed as syllable-sized unit candidates in a multi-tier way. The unit candidates cover all of the syllables, typical phonetic and tone contexts for each syllable, and effects of the phrase construction and sentence intonation on the syllable. The algorithm seeks to maximize the coverage of the unit candidates involved in the extracted sentence set. Experiments were run on a 580k-sentence corpus including dialog and news text. A (9),479 sentence set was selected out. It covers 87.7% of the primary prosodic and spectral characteristics in statements and 61.0% of those in questions. Also, this paper discusses the raw text corpus selection.


Full Paper

Bibliographic reference.  Ni, Jinfu / Kawai, Hisashi (2002): "Design of a Mandarin sentence set for corpus-based speech synthesis by use of a multi-tier algorithm taking account of the varied prosodic and spectral characteristics", In ICSLP-2002, 2361-2364.