ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

How to select a good training-data subset for transcription: submodular active selection for sequences

Hui Lin, Jeff Bilmes

Given a large un-transcribed corpus of speech utterances, we address the problem of how to select a good subset for word-level transcription under a given fixed transcription budget. We employ submodular active selection on a Fisher-kernel based graph over un-transcribed utterances. The selection is theoretically guaranteed to be near-optimal. Moreover, our approach is able to bootstrap without requiring any initial transcribed data, whereas traditional approaches rely heavily on the quality of an initial model trained on some labeled data. Our experiments on phone recognition show that our approach outperforms both average-case random selection and uncertainty sampling significantly.


doi: 10.21437/Interspeech.2009-730

Cite as: Lin, H., Bilmes, J. (2009) How to select a good training-data subset for transcription: submodular active selection for sequences. Proc. Interspeech 2009, 2859-2862, doi: 10.21437/Interspeech.2009-730

@inproceedings{lin09e_interspeech,
  author={Hui Lin and Jeff Bilmes},
  title={{How to select a good training-data subset for transcription: submodular active selection for sequences}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2859--2862},
  doi={10.21437/Interspeech.2009-730}
}