Information retrieval from speech is a key technology for many applications, as it allows access to large amounts of audio data. This technology requires two major components: an automatic speech recognizer (ASR) and a text-based information retrieval module such as a key word extractor or a named entity recognizer (NER). When combining the two components, the resulting final application needs to be globally optimized. However, ASR and information retrieval are usually developed and optimized separately. The ASR tends to be optimized to reduce the word error rate (WER), a metric which does not take into account the contextual and syntactic roles of the words, which are valuable information for information retrieval systems. In this paper we investigate different ways to tune the ASR for a speech-based NER system. In an end-to-end configuration we also tested several ASR metrics, including WER, NE-WER and ATENE, as well as the use of an oracle during the development step. Our results show that using a NER oracle to tune the system reduces the named entity recognition error rate by more than 1% absolute, and using the ATENE metric allows us to reduce it by more than 0.75%. We also show that these optimization approaches favor a higher ASR language model weight which entails an overall gain in NER performance, despite a local increase of the WER.
Cite as: Jannet, M.A.B., Galibert, O., Adda-Decker, M., Rosset, S. (2017) Investigating the Effect of ASR Tuning on Named Entity Recognition. Proc. Interspeech 2017, 2486-2490, doi: 10.21437/Interspeech.2017-1482
@inproceedings{jannet17_interspeech, author={Mohamed Ameur Ben Jannet and Olivier Galibert and Martine Adda-Decker and Sophie Rosset}, title={{Investigating the Effect of ASR Tuning on Named Entity Recognition}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2486--2490}, doi={10.21437/Interspeech.2017-1482} }