5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

A New Framework to Provide High-Controllability Speech Signal And the Development of a Workbench for It

Masanobu Abe, Hideyuki Mizuno, Satoshi Takahashi, Shin'ya Nakajima

NTT Human Interface Labs. Yokosuka-Shi, Kanagawa, Japan

This paper proposes a new framework to enhance the access to and control of speech signals. To enhance accessibility, the proposed framework assigns multi-layered tags such as orthographic transcriptions, and phonetic transcriptions. The tags also make it possible to precisely synchronize a speech signal with animation. In terms of control, the proposed framework provides hybrid speech; combining both human speech and speech synthesis-by-rule. Its quality ranges from simple TTS (the worst case) to encoded natural speech (the best case) depending on the resources available: texts, fundamental frequency(Fo) contour, power contour, phoneme duration, and so on. To create speech messages based on the proposed framework, we developed a workbench employing speech synthesis and recognition techniques. Important features of the workbench are a powerful GUI(Graphical User Interface) with which to manipulate prosodic information and a function to synthesize speech in trial-and- error manner. An evaluation by creating speech messages shows the good performance of the workbench.

Full Paper

