Hospitals turn to AI chatbots for patient messages: Empathetic, efficient but potentially dangerous
While the technical capabilities of generative AI are promising, the review points to critical risks that currently limit its safe deployment in clinical practice. Chief among them are concerns about accuracy. Several studies flagged cases of hallucinated information, outdated content, and incorrect medical advice being generated by AI models. When left unchecked, these errors could lead to harmful consequences for patient care.

Generative AI may be poised to reshape how healthcare professionals communicate with patients. However, behind the promise of empathy and efficiency lies a stark warning: unregulated deployment could jeopardize patient safety and trust, warns a new study published in npj Digital Medicine.
The systematic review “A Systematic Review of Early Evidence on Generative AI for Drafting Responses to Patient Messages” explores the potential of generative AI tools to draft empathetic, efficient, and high-quality replies to patient messages and examines the associated risks, adoption barriers, and ethical considerations.
As patient portals have become an integral part of electronic health record systems, the volume of messages directed at clinicians has surged dramatically, especially since the COVID-19 pandemic. In this environment, generative AI offers a potential solution to alleviate administrative strain. Yet, the study's findings reveal a complex picture: while AI shows strong promise in terms of quality and empathy, it also raises significant questions around safety, reliability, and transparency.
Can generative AI draft messages as effectively as clinicians?
The review analyses whether generative AI tools can reliably match the quality of clinician-drafted responses. Based on an analysis of 23 empirical studies, the authors report that AI-generated replies often match, and in some cases surpass, human responses in perceived empathy and clarity. This capability is particularly noteworthy in low-risk, routine communications such as appointment scheduling, medication refills, or lab test clarifications.
Moreover, several studies indicated that patients frequently rated AI-generated responses as more empathetic than those written by clinicians. This highlights a potential benefit of AI systems in maintaining a sense of care and attentiveness, even in high-volume communication environments. These tools are particularly effective when trained on well-curated data, using prompt engineering to frame responses in a compassionate tone.
Despite these encouraging signs, the review cautions against equating text fluency and style with substantive accuracy. Empathetic language alone is not a sufficient marker of effective clinical communication, especially when complex medical advice or nuanced decision-making is involved.
What are the risks and limitations of using AI for patient communication?
While the technical capabilities of generative AI are promising, the review points to critical risks that currently limit its safe deployment in clinical practice. Chief among them are concerns about accuracy. Several studies flagged cases of hallucinated information, outdated content, and incorrect medical advice being generated by AI models. When left unchecked, these errors could lead to harmful consequences for patient care.
Another major concern is patient safety, especially when generative AI is used without robust human oversight. Although some healthcare systems have piloted AI-assisted responses with clinician review, this hybrid model is far from standardized. Without consistent review protocols and accountability structures, the introduction of AI into patient messaging creates potential legal and ethical vulnerabilities.
Furthermore, transparency emerged as a serious gap. Many patients remain unaware when AI tools are used in their communications, and existing studies did not report uniform practices in disclosure. This raises ethical questions about informed consent and trust, particularly in cases where AI-generated advice significantly shapes patient decisions.
Liability also remains murky. It is unclear who bears responsibility, clinicians, developers, or institutions, when AI-generated communication results in harm. The review highlights the lack of established policies or legal frameworks to address these grey areas.
Why isn’t generative AI widely adopted in real-world clinics?
Despite the technological potential, real-world integration of generative AI into clinical messaging systems remains limited. The authors identify several barriers to adoption, beginning with lack of trust among clinicians. Many providers remain skeptical about the reliability of AI tools, especially when there is no standardized vetting or evaluation framework.
In addition, technical integration challenges have slowed implementation. Many AI systems are not yet compatible with existing EHR platforms, and the workflows required to support clinician oversight can add rather than reduce burden. Furthermore, regulatory bodies have yet to issue comprehensive guidelines for the clinical use of generative AI, leaving health systems in a state of uncertainty.
The review also found wide variation in study design and quality across the reviewed literature. Most existing evaluations use non-standardized metrics, short-term testing, and small datasets, making it difficult to generalize findings or establish best practices. Without rigorous, repeatable evaluation methods, it is hard to build the evidence base required for large-scale deployment.
The lack of meaningful stakeholder engagement, particularly involving patients and frontline clinicians, has further contributed to the slow adoption. Few studies reported involving end users in tool development or deployment strategy, which undermines usability, trust, and accountability.
Building a safe and responsible AI framework in healthcare
While generative AI holds significant promise for improving the efficiency and empathy of patient-provider communication, its use must be guided by clear ethical principles, robust safety mechanisms, and participatory governance.
The authors call for the development of standardized evaluation frameworks to assess AI tools in real-world settings. These should include metrics for accuracy, empathy, clinical appropriateness, and patient satisfaction. Additionally, disclosure policies should be mandatory, ensuring that patients are informed whenever AI is involved in their care.
The study further urges healthcare organizations to implement human-in-the-loop systems where clinicians oversee and validate AI-generated content before it reaches patients. Such oversight is critical not only for safety but also for maintaining professional accountability in a digitally augmented care environment.
Lastly, the authors recommend that future development efforts involve multidisciplinary collaboration, bringing together ethicists, clinicians, patients, technologists, and regulators to co-create responsible systems.
- FIRST PUBLISHED IN:
- Devdiscourse