5th International Conference on Spoken Language Processing
In this paper, we present a prototype speech-based Web browser, SALSA1.0, and describe some of the research issues we need to address while building this system for Hong Kong users. SALSA1.0 allows the user to speak English command words as well as partial or complete link names on any page. The research issues involved in building SALSA1.0 are mainly (1) how to handle large accent variations and mixed-language and (2) how to handle unknown words, especially proper names, in Web links. The recognition engine for SALSA1.0 is trained on WSJ data, and then retrained on a small amount of Hong Kong accent WSJ data to handle accent variations. An edit-distance algorithm is used to replace all unknown words by the closest known word in the word network for recognition. With these methods, link name recognition rate is at 91.20% for links without unknown words, and 82.40% for links with unknown words. SALSA is currently being developed into a multilingual, natural language-based Intranet service provider for HKUST campus information access.
Bibliographic reference. Fung, Pascale / Cheung, Chi Shun / Lam, Kwok Leung / Liu, Wai Kat / Lo, Yuen Yee (1998): "SALSA version 1.0: a speech-based web browser for hong kong English", In ICSLP-1998, paper 0942.