For languages with limited training resources, out-of-vocabulary (OOV) words are a significant problem, both for transcription and keyword spotting. This paper investigates the use of subword lexical units for keyword spotting. Three strategies for using the sub-word units are explored: 1) converting word-based lattices to subword lattices after decoding, 2) performing a separate decoding for each subword type, and 3) a single decoding using all possible subword units. In these experiments, the best performance is achieved by carrying out a separate decoding for each subword type. Further gains are attained through system combination. We also find that ignoring word boundaries improves the detection of OOV keywords without significantly impacting in-vocabulary keyword detection. Results are presented on four languages from the IARPA Babel Program (Haitian Creole, Assamese, Bengali, and Zulu).
Bibliographic reference. Hartmann, William / Le, Viet-Bac / Messaoudi, Abdel / Lamel, Lori / Gauvain, Jean-Luc (2014): "Comparing decoding strategies for subword-based keyword spotting in low-resourced languages", In INTERSPEECH-2014, 2764-2768.