8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Using Corpus-Based Methods for Spoken Access to News Texts on the Web

Alexandra Klein (1), Harald Trost (2)

(1) Austrian Research Institute for Artificial Intelligence, Austria
(2) University of Vienna, Austria

The system described in this paper relies both on a multimodal corpus and a written newspaper corpus for processing spoken and written user requests to Austrian news texts. Requests may be spontaneous spoken and written utterances as well as mouse clicks; user actions may concern actual search, but also control of the browser. Because of spontaneous utterances, a large vocabulary and multimodal interaction, interpreting the user request and generating an appropriate system response is often difficult. Apart from a controller module, the system uses data from two corpora for compensating the difficulties associated with the scenario. Multimodal user actions, which were collected in Wizard-of-Oz experiments, serve as a base for the identification of patterns in users' spontaneous utterances. Furthermore, news documents are used for obtaining background knowledge which can contribute to query expansion whenever the interpretation of users' utterances encounters ambiguity or underspecification concerning the search terms.

Full Paper

Bibliographic reference.  Klein, Alexandra / Trost, Harald (2003): "Using corpus-based methods for spoken access to news texts on the web", In EUROSPEECH-2003, 1037-1040.