ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

Assessing Transfer Learning and automatically annotated data in the development of Named Entity Recognizers for new domains

Emanuel Matos, Mário Rodrigues, António Teixeira

With recent advances Deep Learning, pretrained models and Transfer Learning, the lack of labeled data has become the biggest bottleneck preventing use of Named Entity Recognition (NER) in more domains and languages. To relieve the pressure of costs and time in the creation of annotated data for new domains, we proposed recently automatic annotation by an ensemble of NERs to get data to train a Bidirectional Encoder Representations from Transformers (BERT) based NER for Portuguese and made a first evaluation. Results demonstrated the method has potential but were limited to one domain. Having as main objective a more in-depth assessment of the method capabilities, this paper presents: (1) evaluation of the method in other domains; (2) assessment of the generalization capabilities of the trained models, by applying them to new domains without retraining; (3) assessment of additional training with in-domain data, also automatically annotated. Evaluation, performed using the test part of MiniHAREM, Paramopama and LeNER Portuguese datasets, confirmed the potential of the approach and demonstrated the capability of models previously trained for tourism domain to recognize entities in new domains, with better performance for entities of types PERSON, LOCAL and ORGANIZATION.


doi: 10.21437/IberSPEECH.2022-39

Cite as: Matos, E., Rodrigues, M., Teixeira, A. (2022) Assessing Transfer Learning and automatically annotated data in the development of Named Entity Recognizers for new domains . Proc. IberSPEECH 2022, 191-195, doi: 10.21437/IberSPEECH.2022-39

@inproceedings{matos22_iberspeech,
  author={Emanuel Matos and Mário Rodrigues and António Teixeira},
  title={{Assessing Transfer Learning and automatically annotated data in the development of Named Entity Recognizers for new domains }},
  year=2022,
  booktitle={Proc. IberSPEECH 2022},
  pages={191--195},
  doi={10.21437/IberSPEECH.2022-39}
}