This paper describes several approaches to keyword spotting (KWS) for informal continuous speech. We compare acoustic keyword spotting, spotting in word lattices generated by large vocabulary continuous speech recognition and a hybrid approach making use of phoneme lattices generated by a phoneme recognizer. The systems are compared on carefully defined test data extracted from ICSI meeting database. The acoustic and phoneme-lattice based KWS are based on a phoneme recognizer making use of temporalpattern (TRAP) feature extraction and posterior estimation using neural nets. We show its superiority over traditional HMM/GMM systems. The advantages and drawbacks of different approaches are discussed.
Cite as: Szoke, I., Schwarz, P., Matejka, P., Burget, L., Karafiat, M., Fapso, M., Cernocky, J. (2005) Comparison of keyword spotting approaches for informal continuous speech. Proc. Interspeech 2005, 633-636, doi: 10.21437/Interspeech.2005-69
@inproceedings{szoke05_interspeech, author={Igor Szoke and Petr Schwarz and Pavel Matejka and Lukas Burget and Martin Karafiat and Michal Fapso and Jan Cernocky}, title={{Comparison of keyword spotting approaches for informal continuous speech}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={633--636}, doi={10.21437/Interspeech.2005-69} }