For synthesised speech to be an effective substitute for a visual display it is necessary to divide the displayed text into a set of meaningful utterable elements, and provide the user with controls to select these elements in a meaningful sequence. The textual content of visually conceived data-bases may well have value for blind users, but their page layouts are frequently complex and would give rise to ambiguities and errors in a simple display-to-speech process which utters each line of the display in turn.
The development of software-based layout processing to produce an ordered sequence of meaningful utterable elements linked to the user controls is discussed in the context of implementing a PC-based synthetic speech system for on-line interaction with videotex. Computationally low-cost techniques for generating stress, rhythm and intonation patterns for the synthesised output are also outlined, and consideration is given to generating prosodic enhancement of the synthesised speech output from significant features of the display layout. Further application of these techniques to other synthetic speech systems is addressed briefly.
Bibliographic reference. King, Robin W. (1989): "Layout processing, user control and prosody insertion in an on-line synthetic speech system", In EUROSPEECH-1989, 1121-1124.