Fourth International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2014)
St. Petersburg, Russia
We investigate the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task. Sub-word units based on morphological decomposition and character ngrams are compared. In particular, we examine the benefit of sub-word units that cross word boundaries. Experiments are performed on the IARPA Babel Turkish dataset. Our results demonstrate that cross-word subword units achieve similar performance on OOV keywords as other types of sub-word units, but can be combined to produce further gains. We also show that sub-word units can be used to improve detection of in-vocabulary keywords. System combination provides a 18% relative gain in ATWV with the best two systems, and 25% with the best three systems.
Index Terms: keyword search, spoken term detection, OOV, sub-word lexical units, low resource LVCSR
Bibliographic reference. Hartmann, William / Lamel, Lori / Gauvain, Jean-Luc (2014): "Cross-word sub-word units for low-resource keyword spotting", In SLTU-2014, 112-117.