We propose a method for finding keywords in an audio database using a spoken query. Our method is based on performing a joint alignment between a phone lattice generated from a spoken utterance query and a second phone lattice representing a long utterance needing to be searched. We implement this joint alignment procedure in a graphical models framework. We evaluate our system on TIMIT as well as on the Switchboard conversational telephone speech (CTS) corpus. Our results show that a phone lattice representation of the spoken query achieves higher performance than using only the 1-best phone sequence representation.
Bibliographic reference. Lin, Hui / Stupakov, Alex / Bilmes, Jeff A. (2008): "Spoken keyword spotting via multi-lattice alignment", In INTERSPEECH-2008, 2191-2194.