September 22-25, 1997
This paper proposes a new framework to enhance the access to and control of speech signals. To enhance accessibility, the proposed framework assigns multi-layered tags such as orthographic transcriptions, and phonetic transcriptions. The tags also make it possible to precisely synchronize a speech signal with animation. In terms of control, the proposed framework provides hybrid speech; combining both human speech and speech synthesis-by-rule. Its quality ranges from simple TTS (the worst case) to encoded natural speech (the best case) depending on the resources available: texts, fundamental frequency(Fo) contour, power contour, phoneme duration, and so on. To create speech messages based on the proposed framework, we developed a workbench employing speech synthesis and recognition techniques. Important features of the workbench are a powerful GUI(Graphical User Interface) with which to manipulate prosodic information and a function to synthesize speech in trial-and- error manner. An evaluation by creating speech messages shows the good performance of the workbench.
Bibliographic reference. Abe, Masanobu / Mizuno, Hideyuki / Takahashi, Satoshi / Nakajima, Shin'ya (1997): "A new framework to provide high-controllability speech signal and the development of a workbench for it", In EUROSPEECH-1997, 541-544.