We present a state-of-the-art system for performing spoken term detection on continuous telephone speech in multiple languages. The system compiles a search index from deep word lattices generated by a large-vocabulary HMM speech recognizer. It estimates word posteriors from the lattices and uses them to compute a detection threshold that minimizes the expected value of a user-specified cost function. The system accommodates search terms outside the vocabulary of the speech-to-text engine by using approximate string matching on induced phonetic transcripts. Its search index occupies less than 1Mb per hour of processed speech and it supports sub-second search times for a corpus of hundreds of hours of audio. This system had the highest reported accuracy on the telephone speech portion of the 2006 NIST Spoken Term Detection evaluation, achieving 83% of the maximum possible accuracy score in English.
Bibliographic reference. Miller, David R. H. / Kleber, Michael / Kao, Chia-Lin / Kimball, Owen / Colthurst, Thomas / Lowe, Stephen A. / Schwartz, Richard M. / Gish, Herbert (2007): "Rapid and accurate spoken term detection", In INTERSPEECH-2007, 314-317.