Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Robust Information Extraction from Spoken Language Data

David D. Palmery (2), Mari Ostendorf (1), John D. Burgerz (2)

(1) Electrical and Computer Engineering Department, Boston University, Boston, MA, USA
(2) The MITRE Corporation, Bedford, MA, USA

In this paper we address the problem of information extraction from speech data, particularly improving robustness to automatic recognition errors. We describe a baseline probabilistic model that uses wordclass smoothing in a phrase n-gram language model. The model is adjusted to the error characteristics of a speech recognizer by inserting error tokens in the training data and by using word confidences in decoding to account for possible errors in the recognition output. Experiments show improved performance when training and test conditions are matched.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Palmery, David D. / Ostendorf, Mari / Burgerz, John D. (1999): "Robust information extraction from spoken language data", In EUROSPEECH'99, 1035-1038.