Sixth European Conference on Speech Communication and Technology
Given unlimited amounts of speech training data, it is desirable to predict informative subsets that will still improve the resulting acoustic model. We present a triphone frequency threshold measure for predicting informative subsets from vast amounts of speech. Results with single pass decoding show that acoustic models built from our selection-based speech set perform better than when trained on similar amounts of non-selected speech, and perform similar to models built from the original, larger amount of speech.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Jang, Photina Jaeyun / Hauptmann, Alexander G. (1999): "Selection for acoustic coverage from unlimited speech extracted from closed-captioned TV", In EUROSPEECH'99, 659-662.