AI steps into Oncology boards: ChatGPT shows strong but flawed performance
The study shows that AI can be a reliable supporter of guideline adherence in oncology. Its recommendations were most consistent in cases involving breast and stomach cancers, where established staging systems and treatment algorithms are widely applied. However, alignment weakened when treatment required nuanced consideration of patient-specific conditions such as comorbidities, performance status, and quality-of-life factors.

Artificial intelligence is increasingly being tested in high-stakes medical environments, and a new study has raised the question of whether large language models can function alongside clinicians in critical cancer treatment planning. Researchers conducted a prospective trial to evaluate how ChatGPT-4.0 performed when compared to multidisciplinary tumor councils, the panels that decide treatment strategies for cancer patients.
Published in Healthcare, the study “ChatGPT Performance in Multi-Disciplinary Boards—Should AI Be a Member of Cancer Boards?" is the first prospective evaluation of an AI model’s real-time decision-making in oncology board settings. The findings reveal both encouraging alignment with established medical guidelines and critical shortcomings that limit AI’s readiness for independent decision-making in patient care.
How did AI perform compared to multidisciplinary tumor boards?
The research examined 100 cancer patients presented to tumor councils at Van Training and Research Hospital in Türkiye between November 2024 and January 2025. Each case was anonymized and structured into detailed summaries that included patient demographics, pathology results, radiology findings, and clinical histories. These case files were independently submitted to ChatGPT-4.0 to generate treatment recommendations, which were later compared with the actual decisions made by the councils.
The tumor boards most frequently recommended neoadjuvant treatment (45%) and surgery (36%), while the AI suggested surgery (39%) and neoadjuvant treatment (37%). The overlap between human and AI decision-making reached a concordance rate of 76.4 percent, with strong statistical support. This level of agreement highlights the model’s ability to reproduce guideline-based treatment logic, especially in scenarios where clinical pathways are standardized and patient complexities are minimal.
The study shows that AI can be a reliable supporter of guideline adherence in oncology. Its recommendations were most consistent in cases involving breast and stomach cancers, where established staging systems and treatment algorithms are widely applied. However, alignment weakened when treatment required nuanced consideration of patient-specific conditions such as comorbidities, performance status, and quality-of-life factors.
Where did AI fall short in clinical judgment?
Despite a promising degree of agreement, 23.6 percent of cases showed divergence between the AI model and the tumor boards. These differences were not random but concentrated in scenarios demanding individualized decision-making. For example, when a gastric cancer patient was deemed inoperable due to vascular invasion, the tumor board recommended neoadjuvant treatment while the AI advised surgery. In another case involving advanced disease and poor prognosis, the board prioritized palliative care while the AI leaned toward more aggressive therapy.
These gaps underscore the current limitations of large language models in replicating the human-centered judgment that clinicians apply. Unlike physicians, ChatGPT cannot integrate psychosocial factors, patient preferences, or subtle clinical cues that often guide oncology decisions. The research team emphasized that AI lacks transparency in its reasoning, a challenge commonly described as the “black box” problem. This raises unresolved ethical and legal concerns about accountability if AI recommendations were to directly influence patient care.
The study’s findings add to a growing body of evidence that while AI can accelerate access to clinical guidelines and streamline treatment planning, it should not yet be viewed as an independent decision-maker. The authors stress that responsibility must remain with physicians, with AI functioning strictly as an advisory tool.
Can AI become a member of cancer boards in the future?
While ChatGPT demonstrated substantial alignment with tumor board decisions, it is not yet ready to act as a true member of oncology councils. For AI to be integrated into multidisciplinary workflows, several conditions must be met. These include the ability to interpret real-time patient data, greater transparency in its reasoning processes, and compliance with strict ethical and legal standards.
The study calls for further trials involving other models such as Med-PaLM and BioGPT, along with multi-center collaborations to validate findings across diverse hospital systems. Future research should also explore the integration of genomic profiles, laboratory results, and real-time imaging data into AI platforms to improve contextual accuracy.
Despite its limitations, the model has clear potential benefits. In low-resource healthcare systems, where access to oncology expertise is limited, AI could provide rapid treatment recommendations based on global guidelines. It could also assist specialists by summarizing literature, presenting alternative treatment options, and supporting clinical decision-making under time pressure.
- FIRST PUBLISHED IN:
- Devdiscourse