Accessing Information in Spoken Audio

April 19-20, 1999
Cambridge, UK

Finding Information in Audio: A New Paradigm for Audio Browsing/Retrival

Julia Hirschberg, Steve Whittaker, Don Hindle, Fernando Pereira and Amit Singhal

AT&T Labs - Research, Florham Park, NJ, USA

Information retrieval from audio data is sharply different from information retrieval from text, not simply because speech recognition errors affect retrieval effectiveness, but more fundamentally because of the linear nature of speech, and of the differences in human capabilities for processing speech versus text. We describe SCAN, a prototype speech retrieval and browsing system that addresses these challenges of speech retrieval in an integrated way. On the retrieval side, we use novel document expansion techniques to improve retrieval from automatic transcription to a level competitive with retrieval from human transcription. Given these retrieval results, our graphical user interface, based on the novel WYSIAWYH (``What you see is almost what you hear'') paradigm, infers text formatting such as paragraph boundaries and highlighted words from acoustic information and information retrieval term scores to help users navigate the errorful automatic transcription. This interface supports information extraction and relevance ranking demonstrably better than simple speech-alone interfaces, according to results of empirical studies.

Full Paper (PDF)   Full Paper (Zipped Postscript)

Bibliographic reference.  Hirschberg, Julia / Whittaker, Steve / Hindle, Don / Pereira, Fernando / Singhal, Amit (1999): "Finding Information in Audio: A New Paradigm for Audio Browsing/Retrival", In Access-Audio-1999, 117-122.