5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

A New Synthetic Speech/Sound Control Language

Osamu Mizuno, Shin'ya Nakajima

NTT Human Interface Labs., Japan

The Multi-layered Speech/Sound Synthesis Control Language (MSCL) proposed herein facilitates the synthesizing of several speech modes such as nuance, mental state and emotion, and allows speech to be synchronized to other media easily. MSCL is a multi-layered linguistic system and encompasses three layers: and semantic level layer (The S-layer), interpretation level layer (The I-layer), and parameter level layer (The P-layer). The S-layer is the description level of semantics such as emotional and emphasized speech. The I-layer is the description level of prosodic feature controls and interprets The S-layer scripts to for control on I-layer level. The P-layer represents prosodic parameters for speech synthesis. This multi-level description system is convenient for both laymen and professional users. MSCL also encompasses many effective prosodic feature control functions such as a time-varying pattern description function, absolute and relative control forms, and SDS(Speaker Dependent Scale). MSCL enables more emotional and expressive synthetic speech than conventional TTS systems. This paper describes these functions and the effective prosodic feature controls possible with MSCL.

