Fourth International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2014)

St. Petersburg, Russia
May 14-16, 2014

Cross-Word Sub-Word Units for Low-Resource Keyword Spotting

William Hartmann, Lori Lamel, Jean-Luc Gauvain

Spoken Language Processing Group, LIMSI-CNRS, Orsay, France

We investigate the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task. Sub-word units based on morphological decomposition and character ngrams are compared. In particular, we examine the benefit of sub-word units that cross word boundaries. Experiments are performed on the IARPA Babel Turkish dataset. Our results demonstrate that cross-word subword units achieve similar performance on OOV keywords as other types of sub-word units, but can be combined to produce further gains. We also show that sub-word units can be used to improve detection of in-vocabulary keywords. System combination provides a 18% relative gain in ATWV with the best two systems, and 25% with the best three systems.

Index Terms: keyword search, spoken term detection, OOV, sub-word lexical units, low resource LVCSR

Full Paper

Bibliographic reference.  Hartmann, William / Lamel, Lori / Gauvain, Jean-Luc (2014): "Cross-word sub-word units for low-resource keyword spotting", In SLTU-2014, 112-117.