8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Speech-Based Annotation and Retrieval of Digital Photographs

Timothy J. Hazen (1), Brennan Sherry (1), Mark Adler (2)

(1) MIT, USA
(2) Nokia Research Center, USA

In this paper we describe the development of a speech-based annotation and retrieval system for digital photographs. The system uses a client/server architecture which allows photographs to be captured and annotated on light-weight clients, such as mobile camera phones, and then processed, indexed and stored on networked servers. For speech-based retrieval we have developed a mixed grammar recognition approach which allows the speech recognition system to construct a single finite-state network combining context-free grammars, for recognizing and parsing query carrier phrases and metadata phrases, with an unconstrained statistical n-gram model for recognizing free-form search terms. Experiments demonstrating successful retrieval of photographs using purely speech-based annotation and retrieval are presented.

Full Paper

Bibliographic reference.  Hazen, Timothy J. / Sherry, Brennan / Adler, Mark (2007): "Speech-based annotation and retrieval of digital photographs", In INTERSPEECH-2007, 2165-2168.