ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA2008)
We present a novel architecture for word-spotting which is trained from a small number of examples to classify an utterance as containing a target keyword or not. The word-spotting architecture relies on a novel feature set consisting of a set of ordered spectro-temporal patches which are extracted from the exemplar mel-spectra of target keywords. A local pooling operation across frequency and time is introduced which endows the extracted patch features with the flexibility to match novel unseen keywords. Finally, we describe how to train a support vector machine classifier to separate between keyword and nonkeyword patch feature responses. We present preliminary results indicating that our word-spotting architecture achieves a detection rate of 70-95% with false positive rates of about 0.25-2 false positives per minute.
Bibliographic reference. Ezzat, Tony / Poggio, Tomaso (2008): "Discriminative word-spotting using ordered spectro-temporal patch features", In SAPA-2008, 35-40.