Adverse Drug Reaction extraction on Electronic Health Records written in Spanish: A PhD Thesis overview

Sara Santiso

The aim of this work is the automatic extraction of Adverse Drug Reactions (ADRs) in Electronic Health Records (EHRs) written in Spanish. From Natural Language Processing (NLP) perspective, this is approached as a relation extraction task in which the drug is the causative agent of a disease, the adverse reaction. This would help to increase the reporting of ADRs and their earliest possible detection, helping to improve the health of the patients. ADR extraction from EHRs involves major challenges. First, drugs and diseases found in an EHR are often unrelated or sometimes related as treatment, but seldom as ADRs. This implies the inference of a predictive model from samples with skewed class distribution. Second, EHRs contain both standard and nonstandard abbreviations and misspellings. All this leads to a high lexical variability. Third, the Spanish count with few resources and tools to apply NLP. To cope with these challenges, we explored several ADR detection algorithms (Random Forest and Joint AB-LSTM) and representations (symbolic and dense) to characterize the ADR candidates. In addition, we assessed the tolerance of the ADR detection model to external noise such as the incorrect detection of the medical entities involved in the ADR extraction.

doi: 10.21437/IberSPEECH.2021-34

Santiso, S (2021) Adverse Drug Reaction extraction on Electronic Health Records written in Spanish: A PhD Thesis overview. Proc. IberSPEECH 2021, 155-159, doi: 10.21437/IberSPEECH.2021-34.