Natural-sounding voice output for dialogue systems

Katherine Morton

In dialogue systems using speech mode, current synthesis systems often produce voice output which sounds monotonous, unnatural and is tiring to listen to. Moreover, the speech produced cannot be listened to easily over a period of time as short as a paragraph span. In an interactive dialogue situation users become irritated with the system, and in other situations such as where the system is giving instructions, the user can become bored or uninterested. Good speech output is important because the user is most aware of this mode, rather than the speech recognition mode: for example, errors in recognition can in principle be repaired by the system, and thus do not come to the attention of listeners, but errors in speech output are less easily tolerated. Speech synthesis, then, is not entirely practical at the present time for voice output in dialogue systems without some improvement.

