We present improvements to a keyword spotting (KWS) system that operates in highly adverse channel conditions with very low signal-to-noise ratio levels. We employ a system combination approach by combining the outputs of multiple large vocabulary continuous speech recognition (LVCSR) systems. These systems are complementary thanks to different design decisions across all levels of information: three speech activity detections systems; a wide range of front-end signal processing features (standard cepstral and filter-bank features, noise-robust features and multi-layer perceptron features); three statistical acoustic model types (Gaussian mixtures models, deep and convolutional neural networks); two keyword search strategies (word-based and phone-based). We explore the scenario where the keywords are known in advance by adding them to the language model and assigning higher weights to n-grams with keywords in them. The scores of each individual system are fused by a logistic-regression based classifier to produce the final system combination output. We present the performance of our system in the Phase III evaluations of DARPAs Robust Automatic Transcription of Speech (RATS) program for Levantine Arabic and Farsi conversational speech corpora.
Bibliographic reference. Hout, Julien van / Mitra, Vikramjit / Lei, Yun / Vergyri, Dimitra / Graciarena, Martin / Mandal, Arindam / Franco, Horacio (2014): "Recent improvements in SRI's keyword detection system for noisy audio", In INTERSPEECH-2014, 1727-1731.