Accessing Information in Spoken Audio

April 19-20, 1999
Cambridge, UK

Optimal Parameters for Segmenting a Stream of Audio into Speech Documents

Gerard Quinn and Alan Smeaton

School of Computer Applications, Dublin City University, Dublin, Ireland

Indexing and retrieval of spoken documents is a desirable feature in a digital library as there is a wealth of contemporary information available to us uniquely in this medium. In this paper we describe experiments carried out on the TREC 6 SDR collection to determine the optimal parameters for our speech IR system. In the TREC task the data corresponded to complete news broadcasts and the boundaries between news stories were marked up manually. In an operational news speech retrieval system such as our own, news story boundaries are not always part of the speech data, making it difficult to automatically detect shifts in stories being broadcast. We describe our approach of splitting the stream of audio into speech documents of fixed length and analyse the results from each method culminating in an optimal solution.

Full Paper (PDF)   Full Paper (Zipped Postscript)

Bibliographic reference.  Quinn, Gerard / Smeaton, Alan (1999): "Optimal Parameters for Segmenting a Stream of Audio into Speech Documents", In Access-Audio-1999, 96-101.