5th International Conference on Spoken Language Processing
This paper introduces a paradigm for designing multimodal dialogue systems. An example system task of the system is to retrieve particular information about different shops in the Tokyo Metropolitan area, such as their names, addresses and phone numbers. The system accepts speech and screen touching as input, and presents retrieved information on a screen display. The speech recognition part is modeled by the FSN (finite state network) consisting of keywords and fillers, both of which are implemented by the DAWG (directed acyclic word-graph) structure. The number of keywords is 306, consisting of district names and business names. The fillers accept roughly 100,000 non-keywords/phrases occuring in spontaneous speech. A variety of dialogue strategies are designed and evaluated based on an objective cost function having a set of actions and states as parameters. Expected dialogue cost is calculated for each strategy, and the best strategy is selected according to the keyword recognition accuracy.
Bibliographic reference. Furui, Sadaoki / Yamaguchi, Koh'ichiro (1998): "Designing a multimodal dialogue system for information retrieval", In ICSLP-1998, paper 0036.