MedLog protocol to bring transparency and oversight to AI in healthcare
According to the authors, MedLog is designed to be lightweight enough for early adoption yet detailed enough to capture the complexities of modern multi-stage workflows and agentic AI systems. They argue that such a protocol would transform AI oversight in healthcare much as syslog revolutionized monitoring and troubleshooting in IT systems.

A team of leading researchers and industry experts has introduced a new framework for bringing greater transparency and oversight to artificial intelligence in healthcare. Their study, titled “A Global Log for Medical AI” and published on arXiv, calls for the creation of a universal event-level logging protocol for medical AI, named MedLog.
The paper warns that the rapid deployment of large language models (LLMs) and other clinical AI systems in hospitals worldwide has outpaced the establishment of standardized monitoring tools. This lack of infrastructure, the authors argue, undermines efforts to measure real-world performance, detect bias, address safety issues, and maintain trust in AI-assisted care.
Filling the transparency gap in clinical AI
While traditional reporting frameworks such as TRIPOD+AI, STARD-AI, DECIDE-AI, SPIRIT-AI, and CONSORT-AI have guided trial-stage assessments, they do not provide the infrastructure for tracking AI behavior once deployed in clinical environments. The absence of event-level monitoring, they note, leaves health systems unable to trace how AI models are used, what decisions they influence, and the resulting clinical outcomes.
To address this gap, the team proposes MedLog, a standardized schema for logging every AI model invocation in clinical and operational settings. The protocol defines nine essential fields: header, model instance, user identity, target identity, inputs, internal artifacts, patient- or clinician-facing outputs, outcomes, and user feedback. Each record is assembled incrementally as the AI system operates, enabling retrospective reconstruction of events and linking AI activity to downstream clinical actions and results.
According to the authors, MedLog is designed to be lightweight enough for early adoption yet detailed enough to capture the complexities of modern multi-stage workflows and agentic AI systems. They argue that such a protocol would transform AI oversight in healthcare much as syslog revolutionized monitoring and troubleshooting in IT systems.
Strengthening Safety, Bias Detection, and Continuous Oversight
Safety monitoring and bias detection are central benefits of adopting MedLog. By capturing real-time data on inputs, reasoning traces, outputs, and linked outcomes, MedLog would enable regulators and healthcare providers to detect adverse events, identify performance degradation caused by dataset shifts, and pinpoint disparities across demographic and socioeconomic subgroups.
The researchers highlight that traditional evaluation methods often fail to keep pace with rapidly changing clinical environments. For example, even small shifts in data distributions, such as those caused by changes in laboratory test kits, can compromise the accuracy of predictive models. The study cites a real-world case from Clalit Health Services, where a predictive tool for prioritizing chronic patients detected a significant distribution shift in a key laboratory marker after a new testing kit was introduced. Continuous monitoring through AI logging caught the issue early, preventing a decline in model performance and illustrating the importance of ongoing oversight.
MedLog records also allow for iterative model improvement by providing diagnostic information on error cases, near misses, uncertainty signals, and user feedback. This information can guide active learning, curriculum learning, or meta-learning strategies, helping models adapt to evolving conditions while ensuring safety and accountability. In addition, the logging system supports post-deployment evaluation and regulatory compliance, offering the artifact collection needed for medical algorithmic audits.
Building a foundation for global accountability and trust
The authors frame MedLog as a catalyst for a new form of human–AI epidemiology, enabling systematic study of how AI influences healthcare delivery and patient trajectories. This capability would allow health systems to assess whether AI tools reduce or exacerbate health disparities across regions, demographic groups, and economic settings. De-identified records can be pooled for global benchmarking, offering insights into performance differences between resource-rich and resource-constrained hospitals.
The study notes that adoption of MedLog will require collaboration across healthcare providers, AI vendors, EHR system developers, and regulators. It calls for robust privacy and security measures consistent with HIPAA, GDPR, and ISO/IEC 27001 standards. To support low-resource environments, the protocol includes features such as risk-based sampling, lifecycle-aware retention policies, and write-behind caching, allowing systems with limited infrastructure to participate in AI monitoring without being overwhelmed by data demands.
The authors call on policymakers to recognize the broader value proposition of MedLog. Beyond regulatory compliance and liability management, standardized logging can improve clinical learning, support value-based contracts for AI services, and build trust among clinicians by providing transparent records of AI recommendations. They argue that MedLog should follow the path of syslog, achieving broad adoption through “rough consensus and running code.”
- FIRST PUBLISHED IN:
- Devdiscourse