Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Selection for Acoustic Coverage from Unlimited Speech Extracted from Closed-Captioned TV

Photina Jaeyun Jang, Alexander G. Hauptmann

Computer Science Dept, Carnegie Mellon University, Pittsburgh, PA, USA

Given unlimited amounts of speech training data, it is desirable to predict informative subsets that will still improve the resulting acoustic model. We present a triphone frequency threshold measure for predicting informative subsets from vast amounts of speech. Results with single pass decoding show that acoustic models built from our selection-based speech set perform better than when trained on similar amounts of non-selected speech, and perform similar to models built from the original, larger amount of speech.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Jang, Photina Jaeyun / Hauptmann, Alexander G. (1999): "Selection for acoustic coverage from unlimited speech extracted from closed-captioned TV", In EUROSPEECH'99, 659-662.