12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

OOV Sensitive Named-Entity Recognition in Speech

Carolina Parada, Mark Dredze, Frederick Jelinek

Johns Hopkins University, USA

Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named entities and always produce transcription errors. In this work, we improve speech NER by including features indicative of OOVs based on a OOV detector, allowing for the identification of regions of speech containing named entities, even if they are incorrectly transcribed. We construct a new speech NER data set and demonstrate significant improvements for this task.

Full Paper

Bibliographic reference.  Parada, Carolina / Dredze, Mark / Jelinek, Frederick (2011): "OOV sensitive named-entity recognition in speech", In INTERSPEECH-2011, 2085-2088.