Accessing Information in Spoken Audio
April 19-20, 1999
This paper discusses evaluation of content extraction from audio sources. The most straightforward approach is to adapt existing methods for written sources to handle audio input. A transcription then becomes the representation of the audio source in written form; it must capture the word stream, but also other information that aids in decoding the overall structure and content of the audio source, e.g., music, speaker changes, and speech repairs. The transcription must also support content annotation superimposed on the underlyin speech transcription. When automated speech recognition is used to generate the transcription, there is the additional problem of evaluating content extraction from a noisy transcription. In addition, audio sources differ from their written counterparts in genre and therefore in structure, vocabulary, and even in how names are used. If the audio includes spontaneous conversational speech, as opposed to planned speech, these differences become still more pronounced. We discuss how these differences affect the adaptation of textbased extraction evaluation to audio input. In addition, we describe two new content extraction evaluations that have been designed for use with both audio and written materials.
Full Paper (PDF) Full Paper (Zipped Postscript)
Bibliographic reference. Hirschman, Lynette / Burger, John / Palmer, David / Robinson, Patricia (1999): "Evaluating content extraction from audio source", In Access-Audio-1999, 54-59.