Gender gaps emerge in clinical use of ChatGPT
after interacting with the AI tool. Interestingly, prompts issued by female users led to a higher rate of AI-generated confabulations, factually inaccurate or fabricated content that appears coherent but is ultimately misleading. Despite this, the accuracy of final responses remained consistent across genders, indicating that the confabulations did not translate into worse clinical conclusions. Female participants also demonstrated more substantial increases in confidence in occupational medicine knowledge post-interaction.

A new study examining the role of generative artificial intelligence in occupational medicine has revealed significant gender-related differences in how medical professionals interact with ChatGPT. The research titled “Gender Differences in the Use of ChatGPT as Generative Artificial Intelligence for Clinical Research and Decision-Making in Occupational Medicine” highlights critical disparities in AI engagement, learning outcomes, and user confidence among male and female participants.
Published in Healthcare, the research assessed gender-specific behavior when using ChatGPT-4 to analyze cases in occupational health. Participants evaluated complex cases related to asbestos-related disease, berylliosis, and metal sulfate allergy. The study captured inputs and outputs from each user, gauging diagnostic accuracy, AI hallucinations, communication patterns, and satisfaction levels. The findings underscore the growing need for inclusive and bias-aware AI literacy in clinical environments.
Do men and women use generative AI differently in clinical settings?
The study identifies clear gender differences in the interaction style, learning improvement, and self-assessment results following AI usage. Female participants exhibited significantly greater knowledge gains after using ChatGPT, particularly in diagnosing asbestos-related cancers. While overall diagnostic accuracy did not differ significantly between genders, self-reported competence improved more among women than men after interacting with the AI tool.
Interestingly, prompts issued by female users led to a higher rate of AI-generated confabulations, factually inaccurate or fabricated content that appears coherent but is ultimately misleading. Despite this, the accuracy of final responses remained consistent across genders, indicating that the confabulations did not translate into worse clinical conclusions. Female participants also demonstrated more substantial increases in confidence in occupational medicine knowledge post-interaction.
In contrast, male users displayed relatively stable self-assessments before and after using ChatGPT, with minimal improvement in perceived competence. The findings suggest not only a difference in the interaction strategy but also varying impacts on how generative AI tools influence learning and self-efficacy across genders.
What are the risks of confabulations and gender-influenced communication styles?
A major concern raised by the study is the emergence of “confabulations” - plausible but factually incorrect responses generated by ChatGPT. These AI hallucinations are often difficult to detect, especially without expert knowledge. The research confirms that confabulations occurred more frequently in response to inputs from female participants, though this did not significantly alter final diagnostic accuracy.
The authors argue that language use and communication style may be linked to this phenomenon. Prior studies suggest that men tend to use more direct and content-focused language, whereas women often prioritize interpersonal dynamics and nuanced phrasing. These tendencies can influence how prompts are interpreted by large language models like ChatGPT, potentially leading to subtle biases in output structure and content reliability.
The study’s findings resonate with broader research on digital communication, showing that gender can influence how users interact with AI and how AI interprets those interactions. When applied in high-stakes clinical environments, such variability carries serious implications for patient safety, decision-making integrity, and the training of future healthcare providers.
What are the implications for AI integration in medical education and practice?
This research presents the first real-world insights into the gendered dynamics of generative AI use in occupational medicine. As AI becomes more integrated into diagnostic workflows, documentation, and clinical research, the study calls attention to the necessity of incorporating gender sensitivity into medical AI training and application.
The findings suggest that without proper guardrails, generative AI tools risk amplifying communication disparities, introducing bias, and eroding trust among users. The variability in user-AI interactions could lead to inconsistent educational outcomes or decision-making errors, especially when confabulations are not flagged or corrected by the system.
To address these challenges, the authors advocate for targeted AI literacy training that accounts for gender-specific interaction patterns and equips users with critical skills to evaluate AI output. Medical institutions should also invest in tools that detect and mitigate hallucinations while encouraging interdisciplinary collaboration between AI developers, clinicians, and educators.
The study also highlights the importance of designing AI systems that recognize diverse user strategies and adapt to a range of communication styles. Future research, according to the authors, should examine how other demographic factors, such as age, experience level, and cultural background, influence AI engagement in healthcare.
- FIRST PUBLISHED IN:
- Devdiscourse