7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

An Architecture for a Multi-Modal Web Browser

Cristiana Armaroli (1), Ivano Azzini (2), Lorenza Ferrario (1), Toni Giorgino (2), Luca Nardelli (1), Marco Orlandi (1), Carla Rognoni (2)

(1) ITC-irst, Italy; (2) UniversitÓ di Pavia, Italy

The very rapid evolution of telecommunication technology is leading to the convergence of fixed and mobile networks and devices. At present, it is very widespread for people to access the Web with Internet connections using HTML and/or WML (Wireless Markup Language) browsers, and present portable devices (e.g. PC/PDA, GPRS/WAP phones) offer a range of features (e.g. large memories, graphical displays, friendly user interfaces, communication interfaces, including the possibility to install Internet browsers) that make them suitable for hosting quite all of the applications that can be normally performed by standard PCs. Their main limitation is related to the reduced input/output capabilities, since they frequently lack of an alphanumeric keyboard and have very small displays. In this case, the development of multi-modal browser with voice input/output capabilities and/or making use of other devices (e.g. graphic pointing, touch screens, small numeric keyboards, etc.) should satisfy a large variety of user requirements.

The idea we propose consists in the definition (and consequent realization) of an architecture capable of handling multi-modal browsing through the synchronization of HTML and VoiceXML documents. In doing this, we have to consider issues related to the variability of user/terminal profiles, as well as issues related to the layout adaptation to different presentation modalities (e.g. spatial/temporal axes and hyperlinking). VoiceXML enables users to browse documents by speaking and hearing on a phone, but does not support a graphic interface, as HTML or WML do. We propose to synchronize different documents through a specific platform instead of adding new features to existing HTML, WML or VoiceXML documents. This approach has the advantage of allowing, in a quite general way, multi-modal browsing of existing HTML documents by developing corresponding VoiceXML documents. In any case, we do not exclude the possibility of defining an XML schema that will include, on a general basis, both HTML and VoiceXML syntax, thus allowing a multimodal definition of an application in a single document. What we want to point out is that the system we are proposing can be used in a quite general way, provided that the Markup documents describing the web service to realize are correctly interpreted by specific components. The work presented in this paper has been partially developed inside the E.U. project Homey [2, 3]. The purpose of this project is to monitor the clinical state of chronic patients (in particular patients affected by hypertension pathologies), by means of the telephone, as will be described in section 3. Another application under investigation is to use the multimodal browser for accessing the WebFabIS information system (a system for the data management described in section 3). The benefits resulting from the adoption of mobile devices are currently being investigated.

Full Paper

Bibliographic reference.  Armaroli, Cristiana / Azzini, Ivano / Ferrario, Lorenza / Giorgino, Toni / Nardelli, Luca / Orlandi, Marco / Rognoni, Carla (2002): "An architecture for a multi-modal web browser", In ICSLP-2002, 2553-2556.