LLMs boost adverse drug event detection, yet bias and hallucinations threaten trust

LLMs introduce new risks that must be carefully managed. The review notes that models can overpredict certain adverse events, particularly when trained on imbalanced datasets. They may also generate “hallucinations,” inventing associations that do not exist in the clinical record. These errors can compromise patient safety if used without verification.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 07-08-2025 09:20 IST | Created: 07-08-2025 09:20 IST
LLMs boost adverse drug event detection, yet bias and hallucinations threaten trust
Representative Image. Credit: ChatGPT

Artificial intelligence is accelerating advances in clinical safety, with large language models (LLMs) emerging as powerful tools to detect and manage adverse drug events (ADEs). A recent review details how these models are reshaping pharmacovigilance by increasing accuracy, efficiency, and scalability in monitoring drug-related harms. However, the research warns that deployment must be cautious, with strong regulatory oversight and bias mitigation.

Published in the Journal of Clinical Medicine, the study "Large Language Models for Adverse Drug Events: A Clinical Perspective", the paper examines 39 studies assessing LLMs in ADE detection. By synthesizing clinical performance, applications, and future challenges, the authors provide a comprehensive outlook on how AI is transforming the detection of drug safety risks across multiple domains.

How effectively do LLMs detect adverse drug events?

The review shows that LLMs outperform traditional rule-based and machine learning approaches in identifying ADEs from unstructured clinical narratives. Models such as BERT, GPT-4, and their healthcare-focused derivatives (ClinicalBERT, BioBERT) excel at extracting ADE-related information from electronic health records, physician notes, and medical literature. Their ability to recognize complex contextual clues enables detection of events that might otherwise be overlooked by standard coding systems.

In oncology, for example, SciBERT drastically reduced the time required for reviewing ADE reports, cutting workflows from nine weeks to just ten minutes. Similarly, other models have shown high accuracy in extracting drug–event relationships and reasons for medication discontinuation. These breakthroughs illustrate how AI-driven text analysis can streamline pharmacovigilance while maintaining or even improving sensitivity.

The findings also highlight that LLMs reduce the burden on healthcare professionals by automating tedious data review, allowing clinicians to focus on decision-making. Yet the authors caution that while these models are powerful, they are not infallible. Performance varies by clinical domain, and reliance on them without expert oversight could risk misclassification or missed signals.

What are the eisks and limitations of using LLMs in clinical settings?

LLMs introduce new risks that must be carefully managed. The review notes that models can overpredict certain adverse events, particularly when trained on imbalanced datasets. They may also generate “hallucinations,” inventing associations that do not exist in the clinical record. These errors can compromise patient safety if used without verification.

Bias is another major concern. Models trained on data that underrepresent certain populations may propagate inequities in healthcare outcomes. This is especially critical in ADE detection, where differences in drug response across demographics must be accurately captured. Furthermore, interpretability remains a challenge, as clinicians may struggle to understand the reasoning behind AI-generated outputs.

The review also identifies significant computational demands as a barrier to widespread adoption. Implementing LLMs in real-time clinical workflows requires robust infrastructure, which may be beyond the reach of smaller healthcare organizations. Regulatory frameworks have yet to fully address these challenges, leaving questions about accountability unresolved.

Integrating LLMs into clinical decision-making must be accompanied by regulatory oversight, bias auditing, and transparent performance reporting. Without these safeguards, reliance on AI could lead to unintended consequences, undermining the very safety improvements these tools are designed to deliver.

How will LLMs shape the future of pharmacovigilance and patient safety?

The study outlines two parallel paths for the evolution of LLMs in healthcare. The first focuses on prospective clinical deployment, where AI systems support real-time decision-making under clinician supervision. This approach prioritizes patient safety, fairness, and compliance with regulatory standards. In this scenario, LLMs act as assistants, flagging potential ADEs but leaving final judgments to human experts.

The second path involves research-driven pipelines that leverage LLMs for large-scale pharmacovigilance. Here, models can process vast datasets from multiple sources, enabling early detection of safety signals and supporting drug safety studies. Advanced techniques such as federated learning may allow institutions to collaborate without compromising patient privacy, further enhancing the scope of AI applications.

The review also highlights emerging uses of LLMs in conversational AI, where chatbots equipped with these models offer real-time support to both clinicians and patients. Such tools can provide guidance on medication risks, treatment adjustments, and monitoring, but must be carefully supervised to avoid misinterpretation of complex clinical information.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback