ChatGPT matches doctors in explaining rare eye disease

While the study supports ChatGPT’s potential as an effective and scalable patient education tool, the authors caution against uncritical reliance. The most significant concern is the possibility of AI “hallucinations”, plausible-sounding but inaccurate or fabricated medical content. Such misinformation, if trusted without clinical validation, could endanger patient outcomes.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 04-06-2025 09:53 IST | Created: 04-06-2025 09:53 IST
ChatGPT matches doctors in explaining rare eye disease
Representative Image. Credit: ChatGPT

In a head-to-head comparison of patient education platforms, researchers have found that ChatGPT delivers medical information about congenital cataracts with a quality comparable to experienced doctors and with superior readability. The study, titled "Online Platform vs. Doctors: A Comparative Exploration of Congenital Cataract Patient Education from Virtual to Reality," was published today in Frontiers in Artificial Intelligence.

The research, led by a team from the Eye, Ear, Nose, and Throat Hospital of Fudan University, evaluated health information sourced from Google, ChatGPT, and clinical doctors. The goal was to determine which platform provided the most correct, complete, readable, helpful, and safe responses to frequently asked questions about congenital cataracts - a rare but serious pediatric eye condition. ChatGPT, particularly after readability enhancement, stood out for its accessible and accurate communication, highlighting the growing role of AI in public health education.

How well do online platforms answer medical questions?

The researchers constructed two comprehensive question banks, one based on popular Google searches and the other derived from queries commonly posed to pediatric ophthalmologists. Each question was classified into three categories: pathogenesis and epidemiology, management, and prognosis. Responses were then collected from Google search results, ChatGPT 4o mini, and two qualified doctors: a senior attending surgeon and an ophthalmology resident.

An expert panel of ophthalmologists rated the answers on five key metrics: correctness, completeness, readability, helpfulness, and safety. Google consistently scored the lowest across all criteria. Doctor 1, the experienced surgeon, achieved the highest marks for correctness and safety. However, ChatGPT responses, especially when simplified to a sixth-grade reading level, scored comparably in correctness and surpassed both doctors in readability and overall accessibility.

The readability analysis confirmed that ChatGPT initially generated responses with higher complexity than ideal for lay audiences. However, once readability enhancements were applied using simplified language prompts, the AI’s responses exceeded all groups in Flesch Reading Ease (FRE), while significantly lowering the Flesch–Kincaid Grade Level (FKGL) and Dale–Chall Score (DCS), crucial indicators for health literacy suitability.

What were the key strengths and weaknesses of each platform?

The study identified notable disparities between sources. Doctor 1 provided the most accurate and safe answers across both question sets, while Doctor 2, though less experienced, offered the most readable responses among human participants. ChatGPT’s original outputs were longer, contained more complex vocabulary, and showed higher sentence density. After readability tuning, however, the AI produced shorter sentences with simpler words and a reduced percentage of difficult terms - traits associated with improved comprehension and engagement.

For helpfulness, ChatGPT was rated highest in the Google-derived question bank, suggesting that its natural language style and ability to anticipate user needs can deliver more holistic answers than search engine snippets. In contrast, Google answers often lacked completeness, cited outdated sources, and failed to meet basic JAMA accountability benchmarks, with only one of 30 webpages satisfying all four trustworthiness criteria (authorship, references, disclosure, and update date).

Doctors fared better, especially in addressing practical questions on postoperative care and long-term visual rehabilitation. These topics were underrepresented in Google’s "People Also Ask" data, revealing a critical gap in mainstream online information and underscoring the importance of expert consultation for comprehensive care.

Can AI tools like ChatGPT be trusted for health education?

While the study supports ChatGPT’s potential as an effective and scalable patient education tool, the authors caution against uncritical reliance. The most significant concern is the possibility of AI “hallucinations”, plausible-sounding but inaccurate or fabricated medical content. Such misinformation, if trusted without clinical validation, could endanger patient outcomes.

Additionally, with the introduction of multimodal AI capabilities that can interpret images and documents, there are growing concerns about user privacy, especially in healthcare settings. Misuse of AI systems for unauthorized diagnostics or sharing of sensitive data could violate ethical standards and legal protections. The authors stress the need for regulatory frameworks that clearly define the roles, responsibilities, and limitations of AI in medical education and decision-making.

Another limitation identified in the study is the selection of only one experienced doctor and one resident to represent the medical perspective, which may not fully capture the variability in clinical communication styles. Future research is planned to include broader clinical input and test more advanced or paid versions of large language models.

Despite these concerns, the evidence suggests that AI tools like ChatGPT offer a powerful supplementary channel for delivering accessible health information, especially in contexts where traditional resources are limited or patients face literacy challenges. The simplified responses retained nearly all critical characteristics of the original content, showing no significant drop in correctness, completeness, or safety.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback