ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

A comparison of query-by-example methods for spoken term detection

Wade Shen, Christopher M. White, Timothy J. Hazen

In this paper we examine an alternative interface for phonetic search, namely query-by-example, that avoids OOV issues associated with both standard word-based and phonetic search methods. We develop three methods that compare query lattices derived from example audio against a standard ngram-based phonetic index and we analyze factors affecting the performance of these systems. We show that the best systems under this paradigm are able to achieve 77% precision when retrieving utterances from conversational telephone speech and returning 10 results from a single query (performance that is better than a similar dictionarybased approach) suggesting significant utility for applications requiring high precision. We also show that these systems can be further improved using relevance feedback: By incorporating four additional queries the precision of the best system can be improved by 13.7% relative. Our systems perform well despite high phone recognition error rates (> 40%) and make use of no pronunciation or letter-to-sound resources.


doi: 10.21437/Interspeech.2009-612

Cite as: Shen, W., White, C.M., Hazen, T.J. (2009) A comparison of query-by-example methods for spoken term detection. Proc. Interspeech 2009, 2143-2146, doi: 10.21437/Interspeech.2009-612

@inproceedings{shen09b_interspeech,
  author={Wade Shen and Christopher M. White and Timothy J. Hazen},
  title={{A comparison of query-by-example methods for spoken term detection}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2143--2146},
  doi={10.21437/Interspeech.2009-612}
}