This paper discusses issues arising when applying the IBM Audio-Indexing System to retrieval of video. Issues discussed include the relationship between speech transcription accuracy and retrieval performance, query processing schemes and the critical problem of mapping between cues in speech and the relevant video shots. The temporal relationship between the occurrence of cues in speech transcripts and relevant shots is quantified and then simple schemes for performing this mapping are described and evaluated. Experiments demonstrate the promise of more sophisticated schemes involving up-front video ranking and one possible implementation is discussed. Techniques are evaluated using the TREC-2002 Video Track queries and corpus, comprising a total of 68.45 hours of video.
Cite as: Nock, H.J., Iyengar, G., Neti, C. (2003) Issues in speech-based retrieval of video. Proc. ISCA Workshop on Multilingual Spoken Document Retrieval (MSDR 2003), 67-72
@inproceedings{nock03_msdr, author={H. J. Nock and G. Iyengar and C. Neti}, title={{Issues in speech-based retrieval of video}}, year=2003, booktitle={Proc. ISCA Workshop on Multilingual Spoken Document Retrieval (MSDR 2003)}, pages={67--72} }