ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

The effects of speech recognition and punctuation on information extraction performance

John Makhoul, Alex Baron, Ivan Bulyko, Long Nguyen, Lance Ramshaw, David Stallard, Richard Schwartz, Bing Xiang

We report on experiments to measure the effect of speech recognition errors and automatic punctuation insertion errors on the performance of information extraction (entity and relation extraction). The outputs of several recognition systems with a range of word error rates (WER), along with punctuation insertion, were fed into a system that extracts entities and relations from the recognized text. Entity and relation value scores were measured as a function of WER and types of punctuation used. The results of the experiments showed that both entity and relation value scores degrade linearly with increasing WER, with a relative reduction in scores of about twice the WER. The information extraction modules require the inclusion of sentence boundaries, at a minimum; however, the experiments showed that the exact locations of these boundaries are not important for entity and relation extraction. In contrast, when comparing the effects of full punctuation to just automatic sentence boundary insertion, there was a loss in entity value scores of 13.5% and in relation value scores of 25%. Further, commas play a significantly greater role in entity and relation extraction than other types of punctuation.


doi: 10.21437/Interspeech.2005-53

Cite as: Makhoul, J., Baron, A., Bulyko, I., Nguyen, L., Ramshaw, L., Stallard, D., Schwartz, R., Xiang, B. (2005) The effects of speech recognition and punctuation on information extraction performance. Proc. Interspeech 2005, 57-60, doi: 10.21437/Interspeech.2005-53

@inproceedings{makhoul05_interspeech,
  author={John Makhoul and Alex Baron and Ivan Bulyko and Long Nguyen and Lance Ramshaw and David Stallard and Richard Schwartz and Bing Xiang},
  title={{The effects of speech recognition and punctuation on information extraction performance}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={57--60},
  doi={10.21437/Interspeech.2005-53}
}