ISCA Archive SLTU 2014
ISCA Archive SLTU 2014

Cross-word sub-word units for low-resource keyword spotting

William Hartmann, Lori Lamel, Jean-Luc Gauvain

We investigate the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task. Sub-word units based on morphological decomposition and character ngrams are compared. In particular, we examine the benefit of sub-word units that cross word boundaries. Experiments are performed on the IARPA Babel Turkish dataset. Our results demonstrate that cross-word subword units achieve similar performance on OOV keywords as other types of sub-word units, but can be combined to produce further gains. We also show that sub-word units can be used to improve detection of in-vocabulary keywords. System combination provides a 18% relative gain in ATWV with the best two systems, and 25% with the best three systems.

Index Terms: keyword search, spoken term detection, OOV, sub-word lexical units, low resource LVCSR


Cite as: Hartmann, W., Lamel, L., Gauvain, J.-L. (2014) Cross-word sub-word units for low-resource keyword spotting. Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2014), 112-117

@inproceedings{hartmann14_sltu,
  author={William Hartmann and Lori Lamel and Jean-Luc Gauvain},
  title={{Cross-word sub-word units for low-resource keyword spotting}},
  year=2014,
  booktitle={Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages  (SLTU 2014)},
  pages={112--117}
}