ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA2008)

Brisbane, Australia
September 21, 2008

Discriminative Word-Spotting Using Ordered Spectro-Temporal Patch Features

Tony Ezzat, Tomaso Poggio

Center for Biological and Computational Learning, McGovern Institute for Brain Research, MIT. Cambridge, MA, USA

We present a novel architecture for word-spotting which is trained from a small number of examples to classify an utterance as containing a target keyword or not. The word-spotting architecture relies on a novel feature set consisting of a set of ordered spectro-temporal patches which are extracted from the exemplar mel-spectra of target keywords. A local pooling operation across frequency and time is introduced which endows the extracted patch features with the flexibility to match novel unseen keywords. Finally, we describe how to train a support vector machine classifier to separate between keyword and nonkeyword patch feature responses. We present preliminary results indicating that our word-spotting architecture achieves a detection rate of 70-95% with false positive rates of about 0.25-2 false positives per minute.

Full Paper

Bibliographic reference.  Ezzat, Tony / Poggio, Tomaso (2008): "Discriminative word-spotting using ordered spectro-temporal patch features", In SAPA-2008, 35-40.