15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Comparing Decoding Strategies for Subword-Based Keyword Spotting in Low-Resourced Languages

William Hartmann (1), Viet-Bac Le (2), Abdel Messaoudi (2), Lori Lamel (1), Jean-Luc Gauvain (1)

(1) CNRS/LIMSI, Spoken Language Processing Group, Orsay, France
(2) Vocapia Research, Orsay, France

For languages with limited training resources, out-of-vocabulary (OOV) words are a significant problem, both for transcription and keyword spotting. This paper investigates the use of subword lexical units for keyword spotting. Three strategies for using the sub-word units are explored: 1) converting word-based lattices to subword lattices after decoding, 2) performing a separate decoding for each subword type, and 3) a single decoding using all possible subword units. In these experiments, the best performance is achieved by carrying out a separate decoding for each subword type. Further gains are attained through system combination. We also find that ignoring word boundaries improves the detection of OOV keywords without significantly impacting in-vocabulary keyword detection. Results are presented on four languages from the IARPA Babel Program (Haitian Creole, Assamese, Bengali, and Zulu).

Full Paper

Bibliographic reference.  Hartmann, William / Le, Viet-Bac / Messaoudi, Abdel / Lamel, Lori / Gauvain, Jean-Luc (2014): "Comparing decoding strategies for subword-based keyword spotting in low-resourced languages", In INTERSPEECH-2014, 2764-2768.