ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Heuristic selection of training sentences from historical TV guide for semi-supervised LM adaptation

Harry M. Chang

This paper describes a novel approach to the automatic selection of training sentences from a system-generated data feed for the development of high-precision language models (LMs) required for speech-enabled voice interface applications in the TV search domain. We develop a set of heuristic rules to select training sentences directly from the TV electronic programming guide (EPG) in their metadata form. The training corpus constructed using the selection algorithms encoded with the historical EPG data enables the adapted LMs to have a considerably lower perplexity while achieving a significant reduction in word error rate (WER). When evaluated using the user-generated spoken queries to an experimental TV search application, a 10% absolute reduction of WER is reported over the baseline LMs created without using the training sentences generated from the historical EPG data.


doi: 10.21437/Interspeech.2013-524

Cite as: Chang, H.M. (2013) Heuristic selection of training sentences from historical TV guide for semi-supervised LM adaptation. Proc. Interspeech 2013, 2227-2231, doi: 10.21437/Interspeech.2013-524

@inproceedings{chang13c_interspeech,
  author={Harry M. Chang},
  title={{Heuristic selection of training sentences from historical TV guide for semi-supervised LM adaptation}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2227--2231},
  doi={10.21437/Interspeech.2013-524}
}