In this paper we describe the development of a speech-based annotation and retrieval system for digital photographs. The system uses a client/server architecture which allows photographs to be captured and annotated on light-weight clients, such as mobile camera phones, and then processed, indexed and stored on networked servers. For speech-based retrieval we have developed a mixed grammar recognition approach which allows the speech recognition system to construct a single finite-state network combining context-free grammars, for recognizing and parsing query carrier phrases and metadata phrases, with an unconstrained statistical n-gram model for recognizing free-form search terms. Experiments demonstrating successful retrieval of photographs using purely speech-based annotation and retrieval are presented.
Bibliographic reference. Hazen, Timothy J. / Sherry, Brennan / Adler, Mark (2007): "Speech-based annotation and retrieval of digital photographs", In INTERSPEECH-2007, 2165-2168.